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I.  INTRODUCTION 


STATEMENT  OF  THE  PROBLEM 

This  final  report  is  concerned  with  the  problem  of  demodulating 
multiple  frequency-shift-keved  (FSK)  signals  corrupted  by  a  combination  of 
multiplicative  and  additive  noise.  The  multiplicative  noisy  operator  , 

which  is  illustrated  in  Figure  1,  is  of  such  a  nature  that  transmitted 
sinusoidal  signals  s^t),  i  =  1,...,M,  are  converted  to  zero  mean  Gaussian 
random  processes  U^(t) ,  with  peak  spectral  densities  at  the  transmitted 
frequencies.  The  received  signal  r^ft)  is  the  sum  of  y. (t)  and 
independent  white  Gaussian  noise  n(t).  In  terms  of  hypothesis  testing 
the  alternatives  are: 

,ye  :  r.  (t)  =  s.(t,0J  +  n(t,0  )  ,  0  <  t  <  T,  i  =  1 . M  (1) 

11  1  — b  — n 

where  n(t)  is  an  additive  white  noise  process  and  u-(t)  is  a  segment 
of  a  zero  mean  stationary  Gaussian  random  process.  A  decision  must  be 
made  by  analysis  of  r(t)  ,  as  to  which  Jl £  is  correct  in  a  given 
observation  interval  [0,T].  The  parameters  and  ^  are  the 

unknown  statistics  of  the  random  processes  y^(t)  and  n(t),  respectively. 


n(t) 

Figure  1.  Combined  multiplicative  and  additive  noise. 


For  purposes  of  mathematical  modeling,  these  parameters  are  con¬ 
sidered  to  be  fixed.  In  fact,  they  will  change  slowly  enough  with  respect 
to  hundreds  of  observation  intervals  T,  that  they  may  be  modeled  as  constant. 

In  statistical  terms,  this  problem  is  called  "multiple  alternative 
composite  hypothesis  testing."1  The  term  composite  refers  to  the  condition 
that  each  hypothesis  Jt^CQ)  is  actually  a  family  of  hypotheses  over  the 
range  of  (Q  »0  )  •  Given  knowledge  of  0,  it  is  possible  to  partition  the 
space  11  of  observed  processes  r^(t),  OstST  into  decision  regions,  , 
each  associated  with  an  hypothesis  J^,  in  a  manner  which  leads  to  the 
minimum  probability  of  misclassification.  Without  prior  knowledge  of  0, 
there  are  several  alternative  approaches  all  of  which  attempt  to  obtain  the 
best  partitioning  on  the  average  by  incorporating  the  available  knowledge 
about  0. 

In  some  cases,  the  optimal  partitioning  of  the  space  11,  which  is 
the  union  of  decision  regions  ,  is  independent  of  0  entirely,  or  at 

least  independent  of  some  degree  of  the  dimension  of  0.  In  these  cases, 
there  is  a  "uniformly  most  powerful"  test.2  Unfortunately,  this  is  not  the 
case  for  the  problem  at  hand. 

When  a  probability  distribution  is  postulated  for  0  (that  is,  if 
0  is  considered  to  be  a  random  variable  of  known  distribution  rather  than 
an  unknown  constant) ,  one  may  obtain  the  best  partition  of  the  space  11  on 
the  average  over  that  probability  distribution.  In  this  approach,  which  is 
known  as  "Bayesian,"  the  a  priori  distribution  p(0)  is  updated  by  the  in¬ 
corporation  of  data  obtained  from  r(t)  over  prior  signaling  intervals 
through  the  repeated  use  of  Bayes'  rule. 

A  "Bayesian"  approach  is  strictly  not  applicable  to  the  case  of  an 
unknown  but  constant  0.  To  derive  an  adaptive  receiver  from  the  Bayesian 
point  of  view  would  require  the  assumption  of  a  priori  distributions  for  the 


signal-to-noise  ratio  and  frequency  dispersion  parameters  of  the  channel. 
This  is  not  consistent  with  the  rational  criteria  for  evaluating  receivers. 
We  are  interested  in  obtaining  the  best  possible  performance  at  each  0 
rather  than  the  best  performance  over  some  distribution  of  0. 

A  third  approach,  which  is  more  consistent  with  the  nonrandom  0 

case,  is  to  estimate  the  value  of  0  and  to  treat  that  estimate  0  as  if 

it  were,  in  fact,  the  actual  unknown  parameter.  An  example  of  this  approach 

is  to  use  the  maximum  likelihood  estimate  0,„  in  place  of  0.3  Maximum 

—ML  — 

likelihood  estimates  require  knowledge  of  the  structure  of  the  distribution 
of  r^(t)  and  the  corresponding  sampling  statistic,  but  do  not  require  an 
a  priori  distribution  for  0.  The  parameter  may  be  considered  to  be  a  fixed 
but  unknown  value.  However,  the  use  of  a  maximum  likelihood  estimate  does 
not  provide  any  guarantee  that  it  will  yield  the  minimum  probability  of 
misclassification  on  the  average.  In  that  sense,  it  is  merely  a  heuristic 
technique. 


The  emphasis  here  is  to  obtain  an  adaptive  partitioning  of  ft 
based  on  information  from  prior  observation  intervals  through  an  estimate 
of  the  power  spectrum  of  r^(t)  optimized  for  receiver  performance.  The 
technique  is  similar  to  that  using  maximum  likelihood  estimates  in  that  an 
estimate  is  used  in  place  of  an  actual  statistic  in  an  algorithm  derived  for 
known  statistics;  but  it  is  on  more  firm  theoretical  ground  since  the  esti¬ 
mator  is  optimized  with  respect  to  the  receiver  performance  (misclassification 
probability) . 

The  criterion  of  optimality  for  the  estimator  of  the  power  spectrum 
is  derived  by  a  method  related  to  confidence  intervals  in  the  theory  of  non- 
random  parameter  estimation.  The  optimality  of  the  adaptive  receiver  is 
limited  in  two  ways.  First,  consideration  is  given  only  to  a  receiver  struc¬ 
tured  after  the  optimal  (known  statistics)  receiver  with  a  spectral  estimate 


replacing  the  a  priori  known  spectral  statistic.  Second,  the  probability  of 
misclassification,  as  a  function  of  the  correct  spectrum  and  an  erroneous 
estimate,  is  assumed  to  reach  its  minimum  at  the  true  spectrum,  to  be  a 
symmetric  function  about  the  true  value  of  the  spectrum,  increasing  as  a 
function  of  the  error,  and  to  approximately  translate  as  a  function  of  the 
true  spectral  value.  For  reasons  that  will  later  become  evident,  the 
symmetry  and  translation  properties  are  for  estimation  costs  as  a  function 
of  the  inverse  of  the  spectrum. 

An  additional  objective  is  to  achieve  near  optimal  (known  9)  per¬ 
formance  with  as  small  a  collection  of  prior  observations  as  possible.  The 
degree  and  nature  of  the  nonstationarity  of  the  multiplicative  noisy  operator 
is  unknown  except  that  it  may  be  assumed  to  change  slowly  with  respect  to 
hundreds  of  modulation  intervals.  Thus,  the  objective  of  a  short  prior 
memory  is  to  permit  the  estimation  of  the  time  varying  parameter  and 
parenthetically  to  obtain  the  most  rapid  convergence  consistent  with  near- 
optimal  performance  by  having  the  wide  tracking  bandwidth  which  is  made 
possible  by  a  short  prior  observation  interval.  These  objectives  are  not 
incorporated  in  the  optimization  criterion,  but  appear  as  a  constraint  on 
the  design  of  the  estimator.  The  estimator  is  arbitrarily  constrained  to 
operate  on  a  single  pole  filtered  version  of  the  prior  (raw)  spectra  and 
the  time  constraint  is  manipulated  informally  to  achieve  the  desired 
trade-off  between  accuracy  and  tracking  bandwidth. 

In  a  more  rigorous  but  less  practically  realizable  approach  one 
would  have  to  characterize  the  nonstationarity  beyond  noting  that  it  is  slow 
with  respect  to  many  observation  intervals  T,  assign  and  evaluate  the  cost 
associated  with  the  length  of  the  prior  observation  interval,  and  design 
the  dynamics  of  the  estimator  accordingly.  The  approach  taken  here  is 
considerably  "freer"  with  regard  to  knowledge  about  the  evolution  of  the 
parameter  0.  For  estimator  design  purposes,  the  parameter  is  assumed  to 
be  constant  within  the  "window"  of  the  estimator,  and  no  formal  consideration 
is  given  to  the  nonstationarity  of  the  channel  disturbance. 


The  estimator  of  the  power  spectrum  of  the  received  signal  r^(t), 
referred  to  above,  is  not  an  estimator  of  (analog)  power  spectral  density 
(PSD)  but  of  the  magnitude -squared  discrete  Fourier  transform  of  the 
process  with  N  samples  taken  over  an  observation  interval  of  T  seconds 
coincident  with  the  transmitted  signal.  This  is  referred  to  loosely  here 
as  the  "estimate  of  the  power  spectrum"  to  avoid  any  confusion  between 
continuous  spectra  and  finite  spectra. 

Since  the  full  dimensional  estimator  is  computationally  unwieldy, 
techniques  of  parametric  spectral  estimation  are  investigated  which  allow 
the  spectrum  to  be  estimated  with  the  use  of  a  small  number  of  storage 
registers  and  associated  arithmetic  operations.  The  parametric  spectral 
estimators  are  designed  to  produce  reduced  dimensional  estimates  with  the 
least  possible  degradation  in  receiver  performance. 

Novel  aspects  of  this  study  are  in  its  nonBavesian  approach  to  the 
adaptive  demodulation  problem  and  in  the  area  of  linear  data  reduction 
with  negligible  loss  in  receiver  performance.  Additionally,  a  detailed 
analysis  of  the  misclassification  error  in  the  M-ary  zero  mean  Gaussian 
frequency  shift  keying  (FSK)  case  is  presented  which  does  not  appear 
elsewhere. 


In  the  remainder  of  this  section,  the  physical  channel  to  which 
the  adaptive  demodulation  technique  is  directed  is  described;  the  history 
of  the  "Gaussian  signal"  (known  statistics)  problem  and  of  the  adaptive 
receiver  problem  is  outlined.  Section  II  deals  with  the  optimum  known 
statistics  receiver  and  with  a  closely  related  efficient  suboptimal  (known 
statistics)  receiver.  In  Section  III  the  optimality  of  the  spectral  estimator 
is  demonstrated  and  Section  IV  presents  the  adaptive  receiver  with  the  optimal 
estimator  for  the  M-ary  FSK  Gaussian  zero  mean  signal,  in  white  Gaussian 
noise  with  no  data  reduction  of  the  required  spectrum  estimate.  Section  V 
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introduces  two  techniques  for  reducing  the  dimension  of  the  spectral  estimate 
with  negligible  loss  in  the  receiver  performance.  Section  VI  gives  simulation 
results  for  the  receivers. 

THE  PHYSICAL  CHANNEL 

The  physical  channels  are  wideband  satellite  uplinks  and  downlinks 
for  data  communications  in  the  IfHF  band.  Problems  of  severe  fading  associated 
with  undesirable  random  phase  and  amplitude  modulation  occur  during  conditions 
of  unusually  high  levels  of  ionization  in  the  ionosphere.  These  conditions 
occur  naturally  near  the  equatorial  zones  and  at  the  North  and  South  Poles, 
but  may  also  be  caused  by  man  in  any  region  of  the  Earth  by  the  introduction 
of  radioactive  materials  in  the  atmosphere. 

Figure  2  illustrates  the  problem.  The  ionized  cloud  acts  as  a 
kind  of  random  lens,  focusing  and  defocusing  the  downlink  signal  from  the 
satellite.  If  satellite,  cloud,  and  ground  station  were  stationary  with 
respect  to  each  other,  then  the  signal  received  on  the  ground  would  be 
steady  but  attenuated  according  to  the  geometry  of  the  ionized  cloud.  Rela¬ 
tive  motion  of  the  satellite,  cloud,  or  receiver  brings  about  the  undesired 
modulation. 


The  signal  path  is  not  direct  from  the  source  to  the  receiver,  but 
is  made  up  of  many  reflection  paths  which  add  infinitesimal  contributions 
at  the  receiving  antenna.  This  type  of  propagation  problem  was  studied 
originally  by  an  analytic  technique  for  a  single  phase  screen  by  J.  A. 
Ratcliffe.4  Subsequent  work  by  Tatarskii,5  and  Knepp  and  Valley,6  have  dealt 
with  the  physical  assumptions  necessary  to  derive  signal  statistics,  and 
current  practice  in  this  area  is  to  estimate  the  signal  spatial  autocor¬ 
relation  via  a  multiple  phase  screen  (MPS)  calculation  through  the  compu¬ 
tationally  efficient  two-dimensional  Fast  Fourier  Transform  (FFT) .  MPS 


Figure  2.  The  physical  channel. 
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routines  allow  the  computation  of  signal  amplitude  and  phase  realizations  at 
the  receiver  for  several  phase  screens  in  tandem.  A  considerable  amount  of 
experimentation  has  been  done  with  these  routines  and  computational  results 
have  evolved  over  the  last  few  years  to  the  specification  of  two  extreme 
conditions  for  the  statistical  structure  of  the  received  signal  u^(t).  At 
the  two  extremes  are  a  Gaussian  signal  autocorrelation  at  the  receiver,  and  a 
cubic  roll-off  spectrum. 7  In  either  case  the  first  order  statistic  is 
Gaussian.  Results  are  given  in  Section  VI  for  the  performance  of  the  optimal 
and  conventional  receivers  with  each  of  these  conditions  at  various  signal  - 
to-noise  ratios.  The  adaptive  receivers  are  all  evaluated  for  the  cubic 
roll-off  spectrum  only,  since  it  is  a  worst  case  of  the  two  alternatives. 

Either  of  these  power  spectral  densities  for  y.(t)  may  be 
categorized  by  one  parameter  which  is  typically  given  in  the  time  domain 
as  the  e~*  point  on  the  autocorrelation  function.  This  parameter  is  tq, 
the  signal  decorrelation  time.  Only  one  other  parameter  enters  the  specifi¬ 
cation  of  the  power  spectral  density  of  r.  (t):  the  noise  power  spectral 
density.  Due  to  an  invariance  property  of  the  receivers  under  study  to 
overall  rescaling  of  the  received  signal  r^(t)  =  ik  (t)  + n(t) ,  only  the 
s ignal-to-noise  ratio  will  be  used  as  an  indicator  of  this  second 
parameter.  The  signal-to-noise  ratio  is  given  by  the  ratio  of  the  auto¬ 
correlation  of  Mj,(t)  at  t  =  0,  to  the  power  spectral  density  of  n(t)  in 
watts  per  Hz. 

GAUSSIAN  SIGNAL  IN  GAUSSIAN  NOISE  RECEIVERS 

The  study  of  optimal  receivers  for  the  demodulation  or  detection 
of  Gaussian  signals  corrupted  by  additive  Gaussian  noise  has  been  carried 
out  for  over  twenty  years.  Early  discussions  were  given  by  Price8,9  and 
Middleton^9  and  the  problem  was  later  revived  by  KailathJ1  Stratonovich  and 
Sosul  inland  SchweppeJ3  Kennedy11*  produced  a  small  volume  on  the  subject. 
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These  treatments  all  deal  with  cases  of  known  statistical  structure 
of  the  signal  and  noise  processes.  The  algorithms  process  the  continuous¬ 
valued  time  waveform  r^(t),  or  a  sampled  version  of  it,  rather  than  operate 
upon  a  frequency  domain  representation  as  considered  here  in  Section  II,  but 
the  distinction  is  of  little  consequence.  In  either  case  the  optimal  (known 
statistics)  receiver  yields  the  same  demodulation  decision  on  a  given  sample 
of  r^(t).  Discrete  Fourier  transform  (DFT)  coefficients  of  a  sampled  r^(t) 
are  used  here  rather  than  time  samples  as  a  computational  convenience. 

Several  communication  texts  have  discussions  of  the  "noise  in  noise" 
demodulation  and  detection  problem.  Pertinent  chapters  and  sections  can  be 
found  in  Van  Trees}5  Helstrom}6  Whalen}7  and  Hancock  and  Wintz1?  The  latter 
text  includes  material  on  adaptive  hypothesis  testing  from  the  Bayesian 
point  of  view. 

The  optimal  receiver  of  Price  and  Kailath  was  interpreted  by  them 
and  several  succeeding  authors  as  an  extension  of  the  classical  matched 
filter  for  deterministic  signals.  In  a  matched  filter  demodulator  the 
received  waveform  is  crosscorrelated  with  local  replicas  of  the  several 
possible  transmitted  signals,  and  the  most  highly  correlated  of  these  is 
selected  as  the  demodulation  decision.  Kailath11  showed  that  the  optimal 
receiver  for  detecting  a  Gaussian  random  process  could  be  interpreted  as  a 
matched  filter  where  the  local  replicas  are  given  by  minimum  variance  esti¬ 
mates  of  the  same  incoming  waveform.  These  estimates  are  done  under  the 
assumption  of  different  hypotheses  for  the  spectrum  of  the  received  signal, 
so  that  only  on  the  correct  hypothesis  is  a  minimum  variance  estimate  actually 
obtained.  This  interpretation  of  the  optimal  receiver  has  led  to  its 
designation  as  the  "estimator-correlator"  receiver,  and  from  this  point  of 
view  the  study  of  the  receiver  has  been  extended  with  particular  emphasis  to 
the  nature  of  the  estimator  under  wider  classes  of  signal  statistics. 
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Esposito19’20  showed  that  the  estimator,  in  an  on-off  detection 
problem,  is  minimum  variance  only  if  the  signal  is  Gaussian,  and  proceeded 
to  give  general  expressions  which  show  the  type  of  estimate  that  is  required 
in  the  non-Gaussian  case.  His  expressions,  which  were  based  on  the  general 
likelihood  ratio  formula  of  Kailath21  involving  Ito  stochastic  integrals, 
apply  to  random  signals  of  arbitrary  distribution  in  additive  Gaussian  noise. 
The  class  of  signal  statistics  was  specialized  to  the  exponential  family  by 
Schwartz22’2 3  who  was  then  able  to  use  estimators  based  on  sufficient  sta¬ 
tistics  and  give  more  explicit  expressions  for  the  estimator  in  the  estimator- 
correlator  receiver.  This  work  has  been  extended  to  the  incorporation  of 
information  from  prior  observation  intervals  using  a  Bayesian  approach  by 
Birdsall  and  Gobien24  Gobien25  and  Lee,  Nolte,  and  Hatsell?6 

More  fundamental  discussions  of  the  Bayesian  approach  to  parameter 
estimation  are  given  by  Keehn27and  Spragins29  who  include  a  table  of  repro¬ 
ducing  densities.  These  apply  where  the  sampling  density  for  the  unknown 
parameter  has  a  sufficient  statistic.  Spragins  proves  that  there  is  an 
appropriate  choice  of  the  form  of  the  a  priori  density  for  the  unknown 
parameter  so  that  the  a  posteriori  density  (after  updating  by  Bayes'  rule) 
retains  its  functional  form.  The  significance  for  Bayesian  compound 
hypothesis  testing  is  that  it  allows  the  sufficient  statistic  to  be  updated 
in  a  fixed  algorithm.  Otherwise  the  algorithm  would  tend  to  grow  in  complexity 
as  more  prior  information  is  introduced.  Chien  and  Fu29  have  interpreted 
the  Bayesian  updating  procedure  in  terms  of  stochastic  approximation  algorithms 
and  have  proven  the  convergence  of  such  algorithms  in  mean  square  and  with 
probability  one. 

A  problem  with  parameter  estimation  from  prior  observation  intervals 
is  that  the  detector  or  demodulator  will  have  necessarily  made  some  errors  in 
classifying  the  prior  samples.  These  errors  in  turn  cause  errors  in  esti¬ 
mating  the  required  signal  parameters.  One  may  choose  to  ignore  this  problem, 
live  with  the  errors,  and  derive  the  properties  of  the  estimator  under  the 
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assumption  that  the  priors  are  correctly  classified.  This  is  the  approach 
taken  here  with  a  successful  result.  The  data  of  Section  VI  indicate  that 
there  was  very  little  effect  on  receiver  performance  from  misclassified  priors. 
A  rigorous  treatment  of  the  problem  of  misclassified  priors  can  be  found  in 
a  monograph  by  Patrick  and  Costello30  with  a  review  of  the  literature  in 
the  area. 

A  quite  different  approach  to  hypothesis  testing  of  Gaussian 
random  processes  was  suggested  by  Miller  and  Rochwager?1,32’33’34  This 
diverges  from  the  Price-Kailath  algorithm  in  that  no  knowledge  of  the  signal 
spectrum  is  required.  The  center  frequency  of  the  spectrum  of  u.  (t)  is 
estimated  through  the  proportional  relationship  of  the  first  moment  of  the 
power  spectral  density  about  the  origin  and  the  derivative  of  the  correspond¬ 
ing  autocorrelation  at  its  origin — a  property  of  the  Fourier  transform.  The 
autocorrelation  is  estimated  from  products  of  uncorrelated  pairs  of  samples 
of  r^(t),  and  a  separate  noise  level  estimate  is  required  to  remove  the 
effects  of  n(t)  from  the  computation.  If  the  power  spectral  density  of 
the  received  signal  is  known,  the  technique  is  clearly  suboptimal;  and  where 
good  estimates  are  available  it  is  likely  to  perform  poorly  in  comparison  to 
a  receiver  with  an  estimator.  However,  there  are  situations  where  spectral 
estimation  is  impossible.  For  instance,  if  frequency  band  hopping  techniques 
are  used  and  the  signal  fading  characteristics  are  significantly  different 
among  bands,  there  may  be  no  other  rational  method  to  demodulate  the  signal 
other  than  that  suggested  by  Miller  and  Rochwager. 

Adaptive  techniques  for  the  filtering  of  signals  without  specific 
reference  to  applications  in  hypothesis  testing  have  an  interesting  parallel 
development  to  adaptive  communication  systems.  The  fundamental  problem  is 
very  much  the  same.  Optimal  minimum  mean  square  estimators  have  been 
derived  for  known  signal  and  noise  statistics  but  these  must  often  be 
estimated  in  practice.  There  are  two  stages  of  estimation  involved.  An 
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auxiliary  estimator  is  applied  to  the  unknown  statistics  and  the  resulting 
estimate  is  used  in  the  primary  filtering  operation.  Heuristic  techniques 
were  first  introduced  without  consideration  of  the  quality  of  the  combined 
process.  Weaver 3 s  gives  several  such  examples  for  adaptive  Wiener  filtering. 
Later  authors,  Balkrishnan36  and  Davisson37  dealt  with  estimates  of  the 
estimation  error  and  the  problems  of  convergence.  In  the  area  of  adaptive 
Kalman  filtering,  Mehra38  introduced  an  adaptive  algorithm  which  also  has 
guarantees  of  convergence.  Magill39  improved  upon  the  consideration  of  the 
quality  of  the  estimator  by  restricting  the  unknown  parameters  to  a  finite 
set.  His  algorithm  is  optimal  in  the  transient  period  as  well  as  convergent 
to  the  correct  statistics.  A  similar  technique  was  used  by  Lainiotis4 0  in 
the  context  of  adaptive  pattern  recognition.  Like  Magill  he  postulated  a 
finite  collection  of  possible  values  for  the  unknown  statistics  with  given 
a  priori  probabilities.  These  approaches  are  both  Bayesian  in  the  sense 
that  they  assign  a  priori  probabilities  to  the  unknown  statistics. 

Fundamental  to  this  problem  is  the  fact  that  the  estimation  cost 
is  a  function  of  the  true  value  of  the  unknown  parameter  and  is  therefore 
itself  an  uncertain  value.  The  Bayesian  methods  assume  a  probability  dis¬ 
tribution  for  the  unknown  parameter  and  average  over  it  therefore  making  the 
criterion  a  nonrandom  average  cost.  Without  a  prior  distribution  one  must 
deal  with  an  ensemble  of  cost  functions  to  reflect  the  uncertainty  in  the 
true  value  of  the  unknown  parameter.  The  approach  taken  here  is  to  limit 
this  ensemble  to  a  confidence  region  for  the  parameter  and  to  require  that 
the  estimator  minimize  the  worst  case  over  that  ensemble  of  cost  functions. 


II.  THE  OPTIMAL  RECEIVER  FOR  GAUSSIAN  SIGNALS  IN  AWGN 


BACKGROUND 


The  receiver  derived  here  is  an  obvious  extension  of  the  development 
in  the  communications  literature,  complicated  somewhat  by  the  use  of  the  DFT 
rather  than  analog  signal  processing.  It  differs  from  the  receiver  intro¬ 
duced  by  Kaliath11  in  that  the  DFT  coefficients  of  the  received  process  r^Ct), 
rather  than  time  samples,  are  used  as  the  raw  data.  In  this  section,  the 
joint  probability  density  of  the  DFT  coefficients  of  the  received  signal  - 
plus-noise  r^ft)  is  used  in  the  derivation  of  a  maximum  a  posteriori  proba¬ 
bility  (MAP)  receiver,  rather  than  the  corresponding  density  of  time  samples 
as  done  by  Kaliath.  The  performance  is  demonstrably  identical  whether  time 
samples  or  DFT  coefficients  are  used,  but  efficient  suboptimal  techniques 
are  expected  to  be  more  closely  related  to  the  receiver  operating  on  DFT 
coefficients  than  upon  time  samples. 

A  few  comments  on  terminology  are  necessary.  The  term  AWGN  stands 
for  additive  white  Gaussian  noise.  The  term  optimal  MPE  refers  to  that 
receiver  which  yields  a  minimum  probability  of  error  for  equally  alternative 
signals.  The  MPE  criterion  reduces  to  the  MAP  criterion1  which  is  used  to 
derive  the  receiver.  In  the  derivation  of  the  MAP  receiver,  the  joint 
probability  density  of  the  received  DFT  coefficients  is  viewed  as  a  function 
of  the  index  of  which  signal  was  transmitted,  for  a  given  sample  z_  of 
received  data.  z_  is  an  N-dimensional  block  of  complex  numbers  which  cor¬ 
responds  to  the  DFT  of  a  sample  of  r^(t)  over  the  demodulation  observa¬ 
tion  interval  T.  The  index  which  maximizes  the  a  posteriori  probability 
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is  selected  as  the  receiver's  decision.  Thus,  the  optimal  MPE  receiver 
must  have  prior  knowledge  of  the  joint  density  of  z_  under  each  hypothesis 
of  transmitted  signal.  This  means  that  the  signal  dispersion  and  noise  level 
are  precisely  known  in  advance;  where  these  quantities  are  estimated,  the 
receiver  is  no  longer  optimal  MPE. 

RECEIVER  DERIVATION 


A  concise  development  of  the  optimal  MPE  M-arv  receiver  for 
Gaussian  signals  in  Gaussian  noise  is  presented  here.  The  preliminaries 
appear  in  Appendices  A  and  B  where  it  is  demonstrated  that  the  DFT  coeffi¬ 
cients  of  the  received  signal-plus-noise  are  jointly  distributed  according 
to* 


PiCili) 


(2) 


where  z_  is  the  vector  of  DFT  coefficients  defined  by  Equation  B.3 
and  the  matrix  is  given  by'1' 

[Li]k  9  =  Etz^*}  on  the  ith  hypothesis  .(3) 

It  is  convenient  to  deal  with  the  log-likelihood  function: 


Zn  p(sji)  =  -  N  in(-r)  -  4n|Lj  -  zTlT1V  (4) 

The  first  term  of  the  log-likelihood  function  may  be  discarded  in  maximizing 
Equation  4  since  it  is  neither  a  function  of  the  data  z_  nor  the  index 
i.  The  second  term  is  a  function  of  the  index  i  but  not  of  the  data.  Its 
practical  function  is  to  impart  a  bias  in  favor  of  those  signals  which  are 
received  with  a  reduced  signal  power;  a  situation  which  may  be  expected  to 
occur  in  the  extreme  channels  of  deviation  frequency.  In  this  analysis 
the  term  is  not  used  since  its  effects  have  proven  to  be  insignificant 


*  The  notation  z  e  C*  means  "z_  is  an  N-dimensional  complex  valued  vector." 
t  The  notation  [L^Jk  ^  should  read  "the  k,£  element  of  the  matrix  _L^." 
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for  many  performance  cases  of  interest.  The  third  term  is  a  quadratic  form 
in  the  data  z  and  inverse  covariance  matrix  L.  The  receiver  which 

—  —l 

uses  this  term  only  makes  its  decision  according  to 

,  min  ,  t  -1*  ,  T-l* 

decision  =  p,  where  k  zQ  { L^,  z_*}  =  z_  z_*  (5) 

Note  that  is  the  covariance  matrix  of  the  signal-plus-noise  DFT  coeffi¬ 

cients  rather  than  that  of  the  signal  alone.  An  adaptive  receiver  must 
estimate  the  signal-to-noise  ratio,  unlike  a  conventional  noncoherent  FSK 
receiver  which  does  not  incorporate  the  signal-to-noise  ratio  and  is  never¬ 
theless  optimal  (in  an  undisturbed  channel)  at  each  noise  level. 

PERFORMANCE  ANALYSIS 

An  analytical  solution  to  the  probability  of  error  for  the  optimal 
MPE  receiver  does  not  seem  possible*  since  the  integrals  involved  are  not  con¬ 
veniently  expressed  in  terms  of  known  tabulated  functions.  When  available, 
analytic  expressions  are  advantageous  as  an  efficient  means  to  the  evaluation 
of  receiver  performance,  as  the  foundation  for  sensitivity  studies,  and  as  a 
convenient  tool  to  optimize  receiver  parameters.  A  discussion  of  the  ana¬ 
lytical  approach,  to  the  extent  that  it  may  be  carried  out,  is  given  here. 
Familiarity  with  the  Jacobian  method  for  transformation  of  multidimensional 
random  variables  is  assumed. 

A  receiver  performance  evaluation  typically  proceeds  from  the 
joint  density  of  the  received  signal-plus-noise  to  the  joint  density  of  the 
receiver  statistics  where  the  error  probability  may  be  formulated  as  an 
integral  in  a  simplified  form.  The  received  data  vector  z_  is  shown  in 
Appendix  B  to  be  distributed  according  to  the  complex  normal  distribution: 

-T  -1*,* 

p(z|i)  =  IL.I’1  e  ~  -1  “  ,  z  £  CN  (b) 

*  Exponentially  tight  bounds  were  obtained  for  the  nonoverlapping  signal 
spectra  case  by  Viterbi.17 
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and  the  receiver  statistics  are  given  by 


\  I*  •  *  *  «M 

Let  the  index  set  of  hypotheses  be 

2>m  a  (1,2 . m} 

and  designate  the  joint  distribution  of  statistics 


(7) 


(S) 


3m  =  (9) 

th 

by  fM(q|i)  on  the  i  hypothesis.  Then  the  probability  of  a  correct 
modulation  decision  is 


oo  CO  CO  oo 

'.«///■/  *m(SmI  •  .dqMdq. 

0  q.  q.  q. 


(10) 


A  correct  decision  is  made  when  q^  is  smaller  than  all  of  the  other  q^ ' s . 


The  transformation  from  p(zji)  to  f^Q^ji)  is  at  least  a  twofold 
reduction  in  dimension.  The  data  z_  is  an  N-dimensional  complex-valued 
vector  where  N  is  the  number  of  complex  samples  per  observation  interval. 
Thus  z.  has  2N  real  numbers  in  its  description  whereas  ^  is  M-dimen- 
sional  and  real -valued  with 


MSN  (11) 

The  number  of  transmitted  frequencies  is  at  most  equal  to  the  DFT  dimension 
V  for  a  conventional  DFT- implemented  M-arv  FSK  receiver.  This  change  in 
dimension  is  significant  to  the  application  of  a  Jacobian  technique.  One 
must  first  transform  the  variables  to  a  space  of  the  same  dimension  and 
then  integrate  over  the  extra  variables. 
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We  start  with  Equation  6  and  undertake  several  successive  changes 
of  variable.  First,  a  representation  of  the  distribution  of  £  in  polar 
form  is  useful  to  avoid  problems  in  differentiating  the  complex  distribution. 
Define  the  vectors 


d  ,  ,T 

-  '  [r0’rl' - rN-l 

(12) 

1  =  [8o,01,...,0n.1]T 

(13) 

where 

j0k 

2j.  =  r^e  ,  k=0,  .  . .  ,N-1 

r^  2.  0  for  all  k 

(14) 

A  transformation  from  the  distribution  in  2  to  the  correspond 
ing  distribution  in  r  and  £  is  most  readily  accomplished  if  :  is 
written  in  terms  of  its  real  and  imaginary  parts.  Let 

Re  £  =  [Re{ 2q}  , Re{;^} . Reu^}]1 

05) 

Im  £  4  [Im{;o},Im{21},...Im{:N_1}]T 

(16) 

then  the 

Im  £  : 

complex  normal  distribution  may  be  rewritten  in  terms  of  Re  £, 

p  (Re(£)+jlm(£)  ji) 

=  *  exp  (.-Re  z^iL1  Re  £  * 

T  -1* 

Im  £  L  O  Im  £) 

07) 


21 


22 


[J[Re  z,lm  5 

3lmzk 

3r, 

k=0, . . . ,N-1 

1= N , . . . , 2N- 1 

* 

sin  ek 

k  =  £ 

(23) 

^  [Re  z_,  Im  = 

3Rez. 

-*r  - 

k=N . 2N-1 

1=0 . N-l 

= 

-vin  V 

k  =  £ 

(24) 

^ [Re  z,  Im  £]l-'l>i3]k+N,£+N  = 

3Imzk 

39£  > 

k=N, .... 2N-1 

. 2N-1 

= 

rk  cos  9k, 

k 

(25) 

It  may  be  demonstrated  that  the  determinant  of  the  Jacobian  is 

iJ[Re  z,  Im  =  H  rkCc°s29k*Sin29k)  (26) 

K-U 

which  establishes  Equation  19. 

It  is  desired  to  transform  the  joint  distribution  g  (£>9ji)  to 
a  distribution  of  2N  statistics 

q.  (r,9)  =  r\  (9)r  k=0,l,...N-l 

K - -  k -  (27) 

9k  =  9k  k=0, 1 . N-l 

where  the  set  of  M  hypotheses  has  been  augmented  by  N-M  dummy  variables 
and  the  N  phase  angles  are  retained  in  the  new  density  to  allow  the  use  of 
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a  Jacobian  technique  in  the  transformation.  The  new  distribution  is  given 
by 


(  91  Cl.i) ! 

p(aN,i/i)  =  N  LaN'-] 

'  0,  where  no  inverse  image  exists 


(28) 


where  the  transformation  Q,  defined  by 


Q  :  (r,9)— »-(£n,  9) 


is  nonlinear  but  has  a  single-valued  inverse  for  all  but  pathological 
choices  of  the  set  of  0^(9).  The  new  density  is  zero  in  the  region  where, 
for  a  given  (£*j..§)>  no  corresponding  image  (r_,9)  exists.  Since  the 
statistics  are  coupled  through  the  covariance  matrices  of  Equation  2" 
not  all  (£,N>9.)  are  possible  as  images  of  (r_,SJ)  under  the  transforma¬ 
tion  Q. 


At  this  stage,  the  analytic  approach  begins  to  break  down. 

There  is  no  convenient  expression  for  the  Jacobian  of  this  transformation 
except  that  which  follows  from  the  definition  of  a  determinant.  The 


Jacobian  matrix 


Jr  at  (£>§.)  has  the  following  elements: 

lSn'HJ 


to-  ■  2  r,  1&A91  IX  *  L-'A® W 

<-  **  n 

k=0, . . . ,N-1 
,  i- 0 . N-l 

39k  k=0, . . . ,N- 1 

3r7  =  °  ,  £*N, . . . ,  2N- 1 


(29) 


24 


3qk  , 

367  =  r,r£  rn  sln  Jn,£ 

1  ntt 

k=N, . . . ,2N-1 
,  £=0, .  . .  ,N-1 

36k  k=N, . . . ,2N-1 

36j  =  ,  £=N , . . . , 2N- 1 


(30) 


The  determinant  of  the  Jacobian  matrix  involves  terms  from  only  the  first 
of  these  four  expressions,  due  to  the  property  of  determinants'*1 


Z  f  *  |A|  •  ID-CA1^ 


(31) 


which  holds  for  square  matrices  with  jAj^O.  Since  B=£  and  D=I_,  the 
right  hand  side  of  Equation  31  reduces  to  |a|  .  However,  the  remaining 
matrix  has  generally  all  nonzero  terms  which  makes  the  expression  in 
Equation  28  for  the  transformation  not  useful  for  an  analytical  evalua¬ 
tion  of  the  channel  performance. 

There  is  a  related  integral  which  may  be  useful  either  as  an 
approximation  to  the  optimal  channel  performance,  as  an  approximation  to 
the  performance  of  the  suboptimal  receiver  introduced  in  Section  III,  or  as 
a  bounding  integral  for  either  of  these.  Without  further  reference  to  its 
ultimate  use,  let  us  just  consider  it  a  related  integral  which  appears  to 
be  more  tractable. 

If  the  off-diagonal  elements  of  L"1  are  zero,  the  probability 
density  of  Equation  18  has  no  dependence  on  phase  angles  since  they  occur 
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only  in  off-diagonal  elements.  Then,  upon  integrating  over  the  phase 
angles,  the  distribution  of  envelopes  r_  becomes 


g(£/i) 


-1  N-l  T.-l 

H  (r.)e'-  -i  - 
k=0  K 


(32) 


which  is  a  joint  Rayleigh  density  (that  could  be  factored  into  independent 
marginal  densities  under  the  assumption  that  L?1  is  diagonal.)  The 
Jacobian  of  the  transformation  from  £  to  involves  elements  from 

Equation  29  which  under  the  simplifying  assumption  are 


ir 


k  1 U 


k=0,...,N-l 
i- 0 N- 1 


(33) 


A  matrix  may  be  formed  from  the  diagonal  elements  of  the  SI  L^1  matrices, 
including  the  N-M  dummy  statistics  introduced  earlier: 


k=0, . . .  ,N-1 
[^k,£  =  fLk  ]U  0,...,N-1 


(34) 


then  the  Jacobian  of  the  transformation  from  r  to  ^  is 


v  N-l 

|J  (r)|  =  2N  H  (r.)!|r| 
qN  k=0  K  ‘ 


(35) 


and  the  distribution  of  statistics  becomes: 


q  e  .? 
q  t  & 


(36) 


where 


lx  I  Xil  0.  4=1 . N} 


26 


(37) 


1 


The  X-j_ ' 3  correspond  to  |2^|“  or  channel  received  power  and  are  always 
positive.  &  is  the  set  of  statistics  which  are  images  of  some  data 
vector  z_  and  is  limited  to  a  region  within  the  positive  orthant  of  N- 
space  by  the  positive  matrix  £  acting  on  the  vector  of  positive  elements 


x  4[|z0!"|c1!i...|zN_1i2]T 


(38) 


Now  the  channel  performance  may  be  expressed,  for  the  special 
case  of  diagonal  covariance  matrices: 


Pc/i  -yy--^|L.r1jrr1e-qidq1  dq2...dqN 


(39) 


where 


•p\  =  (l!qisqk, 


(40) 


The  matrix  £  completely  describes  the  optimal  receiver  which  is  identi¬ 
cal  to  the  SPLOT  receiver*  for  diagonal  covariance  matrices.  The  rows  of 
£  generate  the  statistics  ^  through  the  relationship 


iN=U  (4i) 

and  £  is  perfectly  matched  to  the  diagonal  covariance  matrices  via 
Equation  34.  If  an  invertible  matrix  other  than  £  is  used  in  genera¬ 
ting  the  statistics,  say 


In  ■ 


(42) 


* 

Stationary  process-long  observation  time  receiver. 
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the  channel  performance  integral  becomes 

Pc/i  f  !  ill  I  ~ 1  lll"le  dqx  dq2...dqN  (43) 

9>nVi 

where  is  a  row  vector  consisting  of  the  it^  row  of  F_. 

The  performance  integrals  given  by  Equations  39  and  43  apply  to  the  SPLOT 
approximation  to  the  optimal  (known  statistics)  receiver  which  is  introduced 
in  Section  III.  They  are  not  tractable  integrals  and  are  used  only  in  a  formal 
manner  in  the  sequel.  Equation  37,  for  a  correct  receiver  matrix  r,  is 
evaluated  by  numerical  methods  explained  in  the  remainder  of  this  section. 

RECEIVER  PERFORMANCE  -  NUMERICAL  METHODS 

A  Monte  Carlo  technique  was  developed  to  evaluate  the  performance 
of  the  optimal  MPE  8-ary  FSK  receiver  with  Gaussian  zero-mean  signals  in  ' 
AWGN.  A  sampled-data  receiver,  employing  an  ideal  (impulse)  sampler  was 
used  in  the  analysis.  The  discussion  here  is  divided  into  three  parts.  The 
first  part  deals  with  the  incorporation  of  signal-to-noise  ratio  (SNR)  in¬ 
cluding  a  derivation  of  the  autocorrelation  function  of  the  continuous-time 
baseband  waveform  in  terms  of  the  radio-frequency  signal  spectrum  and  white 
noise  level  at  the  receiver.  The  second  part  treats  the  calculation  of  DFT 
covariance  matrices  based  on  sampled  versions  of  the  baseband  autocorrelation 
functions,  and  the  third  part  explains  the  use  of  these  matrices  in  obtaining 
system  performance. 

BASEBAND  AUTOCORRELATION  FUNCTIONS  AND  SNR 

The  received  signal  power  spectral  density  (PSD)  will  be  repre¬ 
sented  by  (f),  with  the  prime  designating  a  waveform  that  exists  prior 
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to  any  filtering  at  the  receiver.  !pri(f)  is  a  two-sided  spectrum  with  the 
units  of  watts/hertz  (W/Hz) .  The  corresponding  white  noise  density  will  be 
designated  Nq/2  W/Hz.  The  appropriate  definition  of  signal-to-noise  ratio, 
where  the  signal  is  a  segment  of  a  stationary  Gaussian  process,  is  given  by 

GO 

T  /|»r,(f)df 

SNR  =  ^V/2 -  (44) 

'  o 

where  the  numerator  is  the  signal  energy  received  during  a  modulation 
interval  T,  and  the  denominator  is  the  white  noise  density.  The  first 
step  is  to  evaluate  the  autocorrelation  function  of  the  baseband  analog 
signal  x(t)  +jy(t),  defined  below,  in  terms  of  'hTi(f)  and  N  . 

In  considering  the  effect  of  the  baseband  filter  on  the  received 
signal  spectrum,  it  is  convenient  to  manipulate  the  block  diagram  of  the 
receiver  to  produce  the  effect  of  the  baseband  filter  in  front  of  the  mixer. 
This  artifice  helps  to  avoid  some  notational  complexity.  Figure  3  shows  the 
baseband  block  diagram  for  the  transformation  of  the  received  signal-plus- 
noise  r' (t)  to  the  comp lex- valued  baseband  waveform  x(t)  +  jy(t). 


x(t)  +  jy(t) 


Figure  3.  Baseband  block  diagram. 
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The  two  baseband  analog  filters  are  denoted  by  their  identical  impulse 
responses  b(t).  This  block  diagram  is  expressed  analytically  by  the 
convolution  integral: 

00 

x(t)  *  jy(t)  *  J  b(x)r' (t-t)  [cosu)c  Ct-t)  *  jsinaj^  (t-r)  ]<it 

0  30 

*  J"  b(t)e  ^WcTr'  (t-r)dT  (45) 

0 

The  rearrangement  of  the  convolution  integral  given  by  Equation  45  suggests 
an  alternative  form  for  the  block  diagram  with  a  filter  in  front  of  the  mixer 
as  shown  in  Figure  4. 


r'(t) 


x(t)  +  jy( t) 


Figure  4.  Alternative  block  diagram. 
Here  the  post-filter  received  waveform  is  identified  as 

/-  jlO  T 

b(T)e  c  r '  (t-r  )dt 

0 

The  properties  of  the  Fourier  transform: 


dT{b(t)}  =  J  b(t)e"  j2?:ftdt 
-  00 


(46) 


(47) 
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give  the  the  result  that  the  transfer  function  of  the  filter  in  Figure  4  is 


^■|b(t)e 


'  jw  t 


fc) 


(48) 


where  u)^  =  In  this  report  we  consider  an  ideal  filter  with  the 

magnitude  transfer  function: 


|  1  ,  -  B  <  f  <  B 

t  j&cni  = 

' 0  ,  otherwise 

so  that  joStf  +  f  )(  becomes 


(49) 


■Sit  *  f  ) 1  = 


I1 

L 


The  relationship  of  |j?(f)(  and 


-  B  -  f  <  f  <  B  -f 
c  c 

(50) 

otherwise 

!d?(f  ♦  f)  (  is  illustrated  bv  Figure  5. 


A 


“1 

1 

- 1 

_ 1 _ ; 

-f  -8  -f  -f  +B 

c  C  C 

Figure  5.  Magnitude  transfer  functions  for  r£(f)  and  J(f +  fc), 
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Now  the  effect  of  the  baseband  filter  is  confined  to  the  transfer  function 
from  r' (t)  to  r(t) .  Make  the  following  definition  for  the  statistics  of 
the  post-filter  received  signal-plus-noise  r(t) : 


r(t)  =  s(t)  +  n(t)  (51) 

E{r*(t)r(t  *T)}  A  Rr(t)  (52) 

E{s*(t)s(t  t)}  A  Rg(x)  (S3) 

E{n(t)n(t + t)}  A  Rfl(T)  (54) 


where  s(t)  and  n(t)  are  the  signal  and  noise  terms  of  r(t) .  By  the 
linearity  of  the  expectation  operator. 


VT)  =  Rs(T)  +  Rn(T)  (55) 

where 

DO 

R5(t)  =  /*r,(f)  fc)[2ej:77fTdf  (56) 

.00 

00 

Rn(T)  =  I  T  +fc)|2ej27TfTdf  (57) 

-oo 


It  remains  only  to  "demodulate"  these  autocorrelation  functions  by  the 
complex  mixing  operation  illustrated  in  Figure  4  to  obtain  the  corresponding 
signal  and  noise  statistics  for  x(t)+jy(t): 

*  -jUJ  t  jU)  (t+T) 

s(t)e  s(t+t)e 

joj  T 

=  RsCt)  e  c 

2  j2T(f +  f  )t 
I  +  fc)  |  e  df 


/o 

’l'r'(f  -  fc)ej2TTfTdf  (58) 

-  B 
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and 


(  -jw  t  jw  (t+T)) 

E*n*(t)e  c  n(t+t)e  c  ( 


"  “n(T,e 


^cT 


(59) 


These  are  the  required  baseband  statistics  for  the  waveform  prior  to  the 
sampler.  In  the  next  section  the  method  of  approximation  to  sample  values 
of  Equations  58  and  59  are  discussed. 


SAMPLE  AUTOCORRELATION  AND  DFT  COVARIANCE  MATRIX 


The  details  of  the  technique  for  calculating  the  DFT  covariance 
matrix  from  the  time  sample  autocorrelation  function  are  discussed  in 
Appendix  C.  Here  the  approximation  used  to  obtain  the  time-sample  auto¬ 
correlation  function  from  the  signal  PSD  ’4>rt(fj  is  treated. 

The  signal  and  noise  terms  (Equations  56  and  57)  are  handled 
separately  for  computational  efficiency  —  for  each  signal  PSD  several  noise 
levels  are  considered.  In  Appendix  C  it  is  indicated  that  2N-1  samples, 
evenly  spaced  in  the  interval  -TitsT  are  required  from  the  continuous, 
time  autocorrelation  function,  where  T  is  the  modulation  interval  and 
N  is  the  number  of  samples  per  modulation  interval  in  the  receiver.  To 
approximate  the  integral  of  Equation  56  a  DFT  of  much  greater  dimension 
than  2N  is  used  and  the  result  is  undersampled  to  the  desired  2N-1  samples. 
The  approximation  to  the  integral  is 

J1>. r'(f  *  fc)ej2irfTdf 
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where  the  PSD  ^r,(f  -  f  )  is  sampled  at  intervals  of  1/(2TP)  Hz  and  a 
rectangular  approximation  is  used.  The  integral  is  evaluated  for  V  points 
of  t  where  S'  =2TP/At  is  the  dimension  of  the  DFT  used,  resulting  in 
the  expression 


where  R  .  (n)  is  used  to  denote  the  approximate  time-sample  autocor- 
s,x  +  jy 

relation  function  of  the  sample  sequence  x(nAt)  + jy(nAt).  Equation  61 
is  a  form  of  inverse  DFT  of  dimension  N’  with  elements  of  the  sample 
sequence  set  to  zero  where  they  fall  outside  the  bandwidth  of  the  baseband 
filter.  The  use  of  the  FFT  algorithm  will  result  in  V  values  of  the  auto 
correlation  function.  Of  these  only  2.N-1  points  are  used  where  N=N’/2P 
the  number  of  samples  per  observation  interval  T  in  the  sampled-data  re¬ 
ceiver.  These  are  the  2N-1  points  surrounding  the  origin. 


The  corresponding  noise  autocorrelation  function  (Equation  57) 
is  approximated  by 


R  .  (n)  =  N  S  o(n) 
n,x  +  jy  o  v 


(62) 


where  we  have  chosen  to  ignore  the  contribution  of  noise  autocorrelation  at 
values  other  than  n  =  0.  This  approximation  will  be  valid  where  the  base¬ 
band  filteT  cutoff  is  in  the  neighborhood  of  the  folding  frequency  N/2T  Hz, 
which  is  the  cutoff  frequency  used  in  this  report. 


The  numerical  method  of  obtaining  signal-to-noise  ratio  normaliza¬ 
tion  makes  use  of  the  fact  that  arbitrary  rescaling  of  Equations  56  and  57 


is 
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will  not  affect  the  resulting  performance.  The  signal  power  is 
normalized  by  establishing  the  equality 
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k  - 


:tp 


(65) 


via  a  numerical  technique  and  then  setting  the  noise  variance  accordingly. 
The  appropriate  value  of  is  obtained  by  solving  Equation  44  for  given 
values  of  SNR  and  T  with  the  signal  power  set  to  one  watt. 


The  signal  autocorrelation  function  obtained  by  Equation  61  is 
arranged  in  a  signal  time-sample  covariance  matrix: 


R  (01  R  (I)  .  .  .  R 
S3  S 

R  (-11  R  (01 

3  S 

R,  (-N  +  11  ...  R^ (01  - 


(641 


As  shown  in  Appendix  C,  the  two  dimensional  DFT  (with  some  rearrangements! 
of  the  array  R_  is  the  signal-term  of  the  DFT  covariance  matrix: 


N  -  1  N  -  1 


E{ 


=jV> =  ~  £  L  vn  -3 

J  N"  n  =  0  m  =  0 


mle 


.  Irkn 

'J  — m—  1 


:*-m 


\  B 
o 


ilk  ,  V 
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The  noise  term  affects  only  the  elements  on  the  main  diagonal  of  the  DFT 

covariance  matrix  L.  defined  by 
—l 


^1. 


\  F(-  -  *4 
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The  DFT  covariance  matrix  on  the  itE  hypothesis  of  signal  transmitted,  is 
defined  by  having  the  element  in  row  k  and  column  i  equal  to  the  expected 
value  of  the  product  of  coefficients  z^  and  z^. 
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MONTE  CARLO  PROCEDURE 


Using  the  method  described  above  a  set  of  M  DFT  covariance 
matrices  is  computed  for  each  level  of  signal  frequency  dispersion  and  for 
each  signal -to-noise  ratio.  The  different  DFT  covariance  matrices  correspond 
to  different  transmitted  deviation  frequencies.  It  is  assumed  that  the 
envelope  of  the  received  signal  spectrum  is  the  same  for  each  transmitted 
frequency,  that  the  spectra  are  symmetrical  about  the  transmitted  frequency, 
and  that  they  have  the  detailed  functional  forms  discussed  in  Section  VI. 

The  DFT  covariance  matrices  are  all  that  is  necessary  to  determine  system 
performance  by  the  statistical  sampling  method. 

In  Appendices  D  and  E  the  technique  to  obtain  sample  vectors  z_. 
distributed  according  to  the  complex  normal  distribution  with  covariance  matrix 
l..  is  developed.  The  technique  involves  the  computation  of  a  square  root 
matrix  for  each  which  is  used  to  produce  the  appropriate  degree  of  cor¬ 

relation  by  matrix-vector  multiplication  with  an  uncorrelated  random  complex 
normal  vector.  For  each  one  of  many  trials  a  sample  vector  is  obtained. 

The  sample  vector  corresponds  to  a  particular  hypothesis  i  of  signal 
transmitted,  level  of  frequency  dispersion,  and  white  noise  level. 

The  sample  vectors  are  tested  with  decision  rules  for  the  conven¬ 
tional  receiver  equation,  for  the  optimal  MPE  receiver  equation,  and  for  a 
particular  suboptimal  receiver  which  is  described  in  Section  III.  The  number 
of  correct  decisions,  obtained  and  total  number  of  trials  for  each  hypothesis 
are  tallied  and  cumulative  results  are  computed  over  an  equal  number  of 
trials  of  all  M  hypotheses.  Results  are  given  in  Section  VI. 


III.  ESTIMATION 


The  previous  section  dealt  with  a  known  statistics  optimal  receiver 
quite  similar  to  the  one  developed  by  Kailath.  It  differs  only  in  that  the 
DFT  coefficients  of  r.(t)  at  a  given  sample  rate  over  the  observation 
interval  (0,T)  are  considered  the  raw  data  rather  than  time  samples.  As 
previously  mentioned,  the  receiver's  decision  is  identical  in  either  formu¬ 
lation.  The  analysis  of  the  misclassif icat ion  error  in  Section  II  concludes 
with  the  derivation  of  an  intractable  integral — even  for  the  simplified 
case  where  the  off  diagonal  elements  of  the  covariance  matrix  are  set 

to  zero.  The  receiver  performance  for  rapid  fading  conditions  is  available 
only  through  numerical  evaluations  which  are  .mpractical  for  more  than  a 
small  number  of  points  due  to  the  large  dimension  of  the  domain  space  of  the 
integration  (which  is  sixteen  in  the  case  of  interest).  This  situation  makes 
the  derivation  of  an  optimal  spectral  estimation  keyed  to  the  exact  receiver 
performance  impossible  or  at  least  impractical.  In  this  section  an  optimal 
estimator  is  derived  under  some  simplifying  assumptions  about  the  receiver 
performance  integral. 

The  complete  matrix  will  not  have  to  be  estimated  but  rather 

only  the  elements  on  the  main  diagonal.  These  points  are  mean  values  of 
magnitude- squared  DFT  coefficients  whereas  the  off  diagonal  elements  of 
are  covariances  between  different  DFT  coefficients.  The  levels  of 
frequency  dispersion,  length  of  the  observation  interval,  and  sample  rate 
of  the  particular  system  under  consideration  are  in  a  realm  where  an  approxi¬ 
mation  of  the  matrix  ,  in  the  receiver  algorithm,  by  another  matrix  with 
the  same  main  diagonal,  but  set  to  zero  on  the  off-diagonal,  yields  approxi¬ 
mately  the  same  receiver  performance. 
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This  approximation  to  the  optimal  receiver  is  known  as  the  stationary  process 
long  observation  time  (SPLOT)  algorithm.  In  Section  VI  it  is  experimentally 
verified  that  the  SPLOT  algorithm  yields  a  receiver  performance  indistinguish¬ 
able  by  Monte  Carlo  integration  from  the  optimal  (known  statistics)  receiver 
in  all  cases  of  interest. 

ESTIMATION  COST  FUNCTION 


Two  equations  are  given  in  Section  II  for  the  probability  of  a 
correct  decision  of  the  SPLOT  algorithm  as  a  function  of  the  true  statistics 
of  the  received  signal  and  of  the  receiver  matrix  £.  Equation  39  pertains 
to  the  performance  when  a  correct  receiver  matrix  £  is  used  and  Equation  43 
gives  the  performance  when  an  erroneous  matrix  £  is  used.  In  either  case 
the  integrals  describing  the  performance  are  not  evidently  tractable.  The 
receiver  matrix  £  is  made  up  of  rows  which  are  the  inverse  of  power 
spectrum  values  for  the  received  signal  r.(t).  Actually,  only  one  row  of 
£  needs  to  be  estimated.  The  signal  frequency  dispersion  is  narrow  enough, 
in  the  practical  case  of  interest,  and  the  sampling  rate  rapid  enough  that 
the  M  rows  of  £  which  are  used  in  the  receiver  algorithm  can  be  generated 
as  rotational  shifts  of  one  row.  Thus,  to  implement  a  SPLOT  adaptive  receiver 
one  may  consider  the  estimation  of  a  centered  power  spectrum  of  N  elements 


-2  -2  -2  -2  T 

SL  •  (CTo  CTi  N-ll 


=  [(E{|to|2})'1(E{lz1l'})‘i---(E{!zN_1|‘:})'i] 


(67) 


2^-1 


.  ~1 1 T 


where  E{|z  |2}  is  the  power  at  the  center  of  the  spectrum  of  the  received 
o  2  2 

signal,  and  E{ j  z  ^ |  }  and  E{|zn  }  are  the  two  points  on  either  side, 
etc.  It  is  important  to  note  that  the  receiver  requires  an  estimate  of  the 
inverse  of  the  mean-square  magnitude  of  DFT  coefficients  rather  than  the 
mean  of  the  inverse  which  would  be  quite  a  different  problem. 
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The  cost  function  for  determining  the  quality  of  the  estimate  cr~" 
is  defined  by  the  degradation  in  the  probability  of  a  correct  decision  on 
the  average  over  M  hypotheses 

c(o"~,  r2)  =  i  £  [pc/i(D  pc/i(£'  (68) 

where  and  ^c/i  are  Pr°babil ities  correct  decisions,  defined 

by  Equations  39  and  43  of  Section  II,  for  the  receivers  incorporating  correct 
statistics  and  erroneous  statistics  respectively. 

Of  course  C(a  ",  £  “)  is  not  known  since  it  cannot  be  practically 
evaluated.  The  objective  here  is  to  investigate  what  assumptions  are  neces¬ 
sary  regarding  this  cost  function  that  an  optimal  adaptive  receiver  may  be 
obtained.  From  another  point  of  view:  If  an  estimator  is  specified  for 
£  “  what  properties  of  C(o  ",  a  ‘~)  must  hold  for  it  to  be  optimal?  All 
that  can  be  said  about  the  cost  function  without  evaluating  it  numerically 
is  that  it  goes  to  zero  when  the  correct  spectrum  is  used  and  that  it  is 
positive  for  all  other  values  —  this  statement  derives  from  the  optimality 
of  the  known  statistics  receiver.  A  primary  question  to  ask  is  whether 

_  "I 

this  property  can  be  extended  to  the  individual  spectral  elements 
k =  0, . . . ,  N-l,  since  it  will  greatly  simplify  the  analysis  if  the  estimators 
of  individual  spectral  elements  can  be  considered  separately. 

In  general  it  is  not  true  that  the  cost  function  is  minimized  at 
the  correct  spectral  value  for  an  individual  spectral  estimate  ,  If 
all  of  the  other  spectral  estimates  were  simultaneously  too  large  or  too 
small  the  best  estimate  of  would  be  correspondingly  above  or  below  the 

true  value.  This  is  evident  since  the  receiver  decision  is  insensitive  to 
overall  rescaling  of  the  received  signal  r^(t).  If  all  of  the  estimates 
used  in  the  receiver  matrix  £  are  off  by  the  same  factor,  the  receiver 
decision  is  still  optimal.  This  fact  derives  from  Equations  38  through  41 


39 


of  Section  II  which  show  that  the  SPLOT  receiver  makes  it  decision  by 
selecting  the  minimum  element  of  the  vector 

%  =  L  X  (69) 

where  y  is  the  vector  of  magnitude-squared  DFT  coefficients  computed  for  a 
particular  observation  interval.  If  either  £  or  ^  is  rescaled  by  a 
factor,  the  minimum  element  of  is  unchanged. 

It  is,  however,  reasonable  to  assume  that  if  the  data  used  in 
_2 

estimating  each  element  are  statistically  independent,  the  estimation 

errors  will  be  randomly  scattered  above  and  below  the  correct  values  so  that 

the  best  estimate  of  a  particular  <j^“  will  occur  near  its  true  value  with 

a  high  probability.  Therefore,  the  first  ad  hoc  assumption  regarding 
- 7  ~.-2 

C(<j  “,  a  )  will  be  that  the  individual  costs  in  estimating  particular 
spectral  values  achieve  a  minimum  at  the  true  value.  This  allows  us  to 
consider  these  estimators  independently  but  restricts  the  estimators  to  be 
based  upon  statistically  independent  data. 

There  is  a  slight  correlation  between  DFT  coefficients  of  different 
frequencies  given  by  the  off-diagonal  elements  of  the  matrix.  These 

terms  were  neglected  in  making  the  SPLOT  approximation  to  the  optimal  receiver 
since  they  had  little  effect  on  the  receiver  decision  for  the  particular 
channel  conditions  considered.  Sample  matrices  which  have  been  cal¬ 

culated  to  evaluate  the  optimal  receiver  performance  show  a  small  correlation 
between  coefficients  which  decreases  as  the  frequency  separation  increases. 

The  level  of  correlation  is  small  enough  that  it  does  not  warrant  considera¬ 
tion  in  terms  of  the  argument  put  forth  that  the  estimation  costs  for  indi¬ 
vidual  spectral  elements  be  considered  to  achieve  a  minimum  at  the  true  value. 
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CRITERION  OF  OPTIMALITY 


With  the  foregoing  assumption  it  is  appropriate  to  consider  the 
estimation  of  one  spectral  element  0^  independent  of  all  of  the  others. 
What  follows  is  a  short  discussion  of  the  Bayesian  approach  to  optimal 
estimation  showing  why  it  cannot  readily  be  developed  into  a  nonBayesian 
criterion.  An  alternative  criterion  is  introduced  which  is  based  on  suffi¬ 
cient  statistics  and  the  theory  of  confidence  intervals  for  estimating  non- 
random  parameters. 


The  criterion  of  estimation  quality  in  a  Bayesian  approach  to 

combined  parameter  estimation  and  demodulation  is  the  expected  value  of  the 
-2  ''-2 

cost  function  Cfo^  ,  a  (Z^)].  The  average  cost  or  "risk"  function  is 
minimized  with  respect  to  the  choice  of  the  function  cr^” (Z^)  mapping  the 

available  data  Z,  into  the  estimate.  The  expected  value  is  taken  with 

^  -2  * 
respect  to  the  joint  density  of  the  random  parameter  and  data  Zj.: 

OO  CO 

=  J J  C[o'2,  ?(Z)]p(Z,  cf2)dzdcf2  (70) 

—  OO  -OO 


This,  of  course,  requires  that  a  probability  density  be  specified  for  a  “. 
Where  it  is  not  appropriate  to  assume  a  probability  density  for  the  unknown 
parameter  one  might  consider  a  risk  function  similar  to  Equation  70  which 
varies  with  the  parameter  value 


•#(c0 


(Z)]p(Z/cr'2)dZ 


(71) 


*  The  k  subscript  is  dropped  in  most  of  the  remainder  of  this  discussion. 
There  should  be  no  confusion  since  the  entire  discussion  deals  with  the 
estimation  of  the  k**1  element  of  the  spectrum. 
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but  with  such  a  formulation  the  estimator  which  minimizes  this  risk  is 
inevitably 

?(Z)  =  a'2  (72) 

since  the  cost  function  achieves  a  minimum  where  the  true  parameter  is  used. 


This  result  is  mathematically  correct  but  useless  since  it  calls 

for  an  estimator  which  incorporates  knowledge  of  the  unknown  parameter.  The 

-2 

fundamental  source  of  the  problem  seems  to  be  that  a  ~  appears  as  a  constant 
in  the  right-hand  side  of  Equation  71.  In  order  to  mathematically  express 
the  fact  that  a^2  is  unknown  it  should  appear  as  a  variable. 

We  propose  a  criterion  which  will  allow  a  ^  to  be  a  variable 

based  on  the  theory  of  confidence  limits  and  sufficient  statistics.  If  a 

_2 

sufficient  statistic  exists  for  the  probability  density  p(Z/a  )  then 

-2 

it  is  possible  to  base  parameter  estimates  of  a  on  that  statistic  without 
losing  any  relevant  information.  In  other  words,  any  statistical  inferences 
on  the  data  Z_  may  equivalently  be  made  on  the  statistic  rather  than  the 
complete  data  set  Z.  One  of  several  criteria  of  the  sufficiency  of  a 
statistic  is  that  the  probability  density  for  Z  conditioned  on  the 

knowledge  of  the  statistic  be  independent  of  the  parameter  a  ".  Here  we 

mean  functionally  independent  rather  than  statistically  independent  in 
keeping  with  the  point  of  view  that  a  is  not  specified  by  a  probability 

density.  This  implies  that  the  statistic  incorporates  all  of  the  available 

information  regarding  the  unknown  parameter. 


Sufficiency  can  be  established  using  the  functional  form  of  the 
joint  density  for  Z_  and  the  form  of  the  sampling  statistic.  A  discussion 
of  the  statistic 

ao 

i  2 


-  E  M 


n=l 


Jkn' 


EUfc}  ■  E{|zJ2}  •  o' 


(73) 


(74) 
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0 


is  deferred  to 


_2 

which  is  used  here  to  estimate  the  unknown  parameter  a^“ 
the  last  part  of  this  section,  z  is  an  approximation  to  a  sufficient 
statistic  for  the  unknown  parameter. 

The  theory  of  confidence  limits  allows  probabilistic  statements 

-2 

to  be  made  about  the  location  of  the  parameter  a  given  the  value  of  the 

statistic  z.  These  statements  are  derived  from  analysis  of  the  probability 

density  p(z/a  ")*  for  a  fixed  parameter  and  are  then  "turned  around" 

logically  to  make  an  inference  about  the  value  of  a  given  a  sample  z. 

—  -2 

With  knowledge  of  the  form  of  the  sampling  statistic  p(z/o  )  one  may 
compute  the  probability,  for  a  given  e,  whether  z  falls  in  the  interval 

(l+e)a2 

P[(l-e)a2  <  ?  <  (l+e)a2]  =  j"  p(z/a“)dz 

(l-e)a2 

=  1  -  a  (75) 

—  2 

The  probability  density  p(z/a  )  sets  up  a  relationship  between 
e  and  a  such  that  as  e  goes  to  zero  a  goes  to  1  and  as  £  goes  to 
infinity  a  goes  to  zero.  Confidence  limits  are  typically  based  on  the 
normal  density  through  an  invocation  of  the  central  limit  theorem. 42  Such  an 
asymptotic  approximation  is  neither  necessary  or  desirable  in  this  case. 

We  wish  to  consider  cases  where  zk  will  be  computed  by  a  weighting 
sequence  (b  )  which  is  too  short  for  a  normal  approximation  to  the 
density  p(z/a  2) . 

Since  the  parameter 
the  sense  that 

p_  ?(CF/Ca2) 
z/a 

_  2  —  -2 

*  Notice  that  p-,  2U/0  )  =  p=r/0-2 )  anc*  ttiat  ^ese  notations  are  used 
interchangeably' with  the  subscripts  deleted. 


2 

0  is  a  scale  parameter  for  the  density  in 

=  ?_1P_  7(F/02)  w 

z/a 
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the  relationship  between  a  and  e  given  by  Equation  75  does  not  change 
2 

as  a  function  of  a  .  This  can  be  seen  by  making  the  change  of  variable 
from  z  to  y  =  a  z  in  the  integral: 


(l+e)cr 


1+e 


(l-e)o 


f  p_  (z/c2)dz  =  f  p_  y(a2y/a2)a2dy 

J .  2  zfo~  ,  J  z/a“ 

1-e 

1  +  £ 

=  J  P_  2(y/l)dy 
J  z/a 


(77) 


1-e 


The  confidence  coefficient  1  -  a  expresses  the  probability  that  a  sample 

—  2 
z  is  within  a  factor  1-e  to  1+e  of  the  mean  value  a  regardless 

2 

of  the  magnitude  of  a  according  to  this  formulation. 


_  2 

The  probabilistic  statement  regarding  z  for  a  given  0  may 

be  conveniently  converted  to  a  corresponding  statement  regarding  o  2  given 
—  —  2 

a  sample  of  z.  Note  that  z  and  a  are  both  positive. 


then 


or 


(l-e)cT  <  z  <  (l+e)a2 
—  2 

l-e<zcr  <l  +  e 

L^JL  <  0-2  <  J_^ 

2  Z 


(78) 


(79) 

(80) 


and  the  confidence  interval  may  be  used  to  put  an  upper  bound  on  the  cost 
given  a  sample  of  the  statistic  z  and  a  confidence  coefficient  1  -  a. 


The  upper  bound  is  derived  in  the  following  manner:  Once  o 
is  confined  to  a  symmetric  interval  about  z  *  the  cost  is  upper  bounded  by 


BF  (a) 
•#(a) 


max  ,  . 2  _ .  i  v 

=  -2  JC[a  \F(z  )] 
a  e3B{olV 

=  [7  _1(l-e) ,  z  _1(l+e)] 


(81) 


(82) 
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where  JB  is  the  set  of  values  for  a  prescribed  by  the  confidence 
coefficient  1  -  a,  and  F(*)  is  an  instantaneous  function  mapping  the 
(inverse)  sufficient  statistic  z  1  into  an  estimate  of  a"2.  With  this 

_  2  _ l 

definition  an  equivalence  is  set  up  between  the  events  {C[o  ,F(z  )  ]  <  (a) } 

-2  —  ** 
and  (a  e^B(a)}  both  conditioned  on  z  so  that  the  confidence  limit  applies 

‘  -2 

equally  to  the  event  that  a  is  on  the  interval  a)  and  the  event  that 
the  cost  is  upper  bounded  by  Bp (a)  given  a  specific  value  for  the  statistic. 

The  object  of  optimization  then  is  to  find  the  function  FfT'1) 
which  yields  the  smallest  upper  bound  Bp(a)  for  a  given  confidence 
coefficient  in  the  equation 


P(c[a~2,F(;  _1)]  <  Bp  (a))  =  1  -  a  . 


(83) 


—  1  _ l  _ i 

Figure  6  shows  a  mapping  from  z  to  F(z  )  to  C(a  ,F(z  )  1 . 
_2  — 1  -2  _ 1 

The  two  cost  curves  for  a  =  (l-£)z  and  a  =  (1+e)z  are  indicated, 
and  the  figure  illustrates  that  an  upper  bounding  cost  Bp(a)  will  be 
achieved  on  one  of  the  two  extreme  cost  curves  of  the  interval. 


The  conclusion  that  one  of  the  two  extreme  curves  on  the  interval 

will  set  the  maximum  cost  for  a  given  F(z  1  )  on  the  family  of  curves  over 

the  interval  is  justified  on  the  assumption  that  the  cost  is  monotonically 

.■>  _ 1 

increasing  as  a  function  of  the  distance  of  cr  from  F(z  ),  and  that 

.  ? 

the  individual  cost  curves  in  the  family  of  curves  with  o  on  the 
interval  [(l-e)z  1  , (l+e)z  1  ]  are  properly  "nested"  in  the  sense  that 


ac[q~2,F(z  -1)1 

-7 

3a  “ 


>  0  for 
<  0  for 


F(z  _1)  >  o'2 

F (z  1 )  <  a*2 


(84) 


for  any  fixed  F(z  *) .  This  is  a  fairly  loose  regularity  condition  which 

merely  implies  that  the  estimation  cost  does  not  reach  a  particularly  extreme 

-2 

condition  for  some  a  on  the  interval. 
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Figure  6.  Mapping  from  z"^  to  F(z”^)  to  cost  functions  at  the 
extremes  of  the  confidence  interval. 


Under  this  regularity  assumption  the  optimal  function  F(7_i) 
mapping  the  sufficient  statistic  would  be  somewhat  above  or  below  a  linear 
function  F (z  1)  =  z  1  and  would  occur  at  the  crossover  of  the  extreme 
curves  to  yield  the  minimum  B(a)  as  illustrated  in  Figure  6.  Furthermore, 
the  optimal  F(z  would  vary  with  the  interval  used  and  therefore  with 

the  confidence  coefficient.  Therefore  it  is  evident  that  in  order  to 

—  1  _2  _ 1 

specify  an  optimal  F(z  )  additional  assumptions  regarding  C[o  ,F(z  )  ] 

are  required.  Figure  7  illustrates  the  more  restricted  condition. 

_2  _ 1  -2 

If  C[a  ,F(z  )]  is  a  symmetric  function  about  a  , 

C[cf  2,a~2+<5]  =  C[a~2,a'2-<5]  (85) 

and  if  the  cost  function  "translates"  over  the  interval 

C[a“2,F(7-1)]  =  Cto'^S.FCz"1)  +  6]  for  o'2c3B{ol) 

(86) 

.  2 

and  a  +6e.Jff(a) 
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Figure  7.  Mapping  from  (z*  )  to  F(z~  )  to  cost  functions  at  the 
extremes  of  the  confidence  interval  for  symmetric, 
translating  cost  functions. 


then  the  cost  curves  originating  at  the  extremes  of  the  interval  will  always 
cross  over  in  the  center  and  =  z  is  the  optimal  estimator  of 

a  ~  for  the  specified  criterion  in  that  it  yields  the  lowest  bound  on  the 
estimation  cost  for  a  given  confidence  coefficient. 


A  criterion  of  optimality  suitable  for  the  case  of  an  unknown  but 
nonrandom  variable  has  been  stated  and  showg^to  apply  under  the  following 
assumptions  about  the  cost  function  C( j : 

1.  The  cost  for  estimating  individual  elements  of  the  vector 

-2 

c  achieve  the  minimum  at  the  true  value. 

.  i  _ 1 

2.  The  cost  function  C[a  “,F(z  )]  for  estimating  individual 

elements  of  the  vector  are  symmetric  about  the  true  value  and 
invariant  in  shape,  that  is  C[o  “,F(z  )]  =  C[jc  “-F(z  ,] 

over  a  suitable  range  of  possible  values  of  a  to  applv 

a  confidence  limit  criterion. 
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3. 


The  cost  function  increases  monotonical ly  as  a  function  of 


Under  these  assumptions  F(z  _1)  =  z  ’1  that  is,  the  inverse  of  the  average 

-2 

of  square  magnitude  DFT  coefficients  is  an  optimal  estimator  of  o  in 

that  it  yields  the  smallest  upper  bound  on  the  cost  for  any  confidence 

_■) 

interval  of  0  “  for  which  the  above  two  assumptions  hold. 

The  conditions  on  the  cost  function  under  which  F(c  *)  =  z  *  is 

-  2  -2' 

optimized  are  not  precisely  satisfied  by  C(0  ^  “)  but  will  be  a  good 

engineering  approximation  if  a  large  enough  DFT  dimension  is  used  and  if 
the  confidence  interval  is  small  enough. 

THE  STATISTIC  z 


The  statistic  z  defined  by  Equation  73  is  a  weighted  average  of 
prior  samples  of  magnitude  squared  DFT  coefficients  of  the  received  signal 
r^(t)  at  a  fixed  frequency.  It  is  not  a  sufficient  statistic  for  the  priors 
since  it  is  a  weighted  average  rather  than  a  true  average. 


That  an  unweighted  average  of  |z^n]“  terms  is  a  sufficient  sta¬ 
tistic  for  a  finite  length  sample  _Zj.  may  be  determined  by  an  application 

2 

of  the  Neyman-Fisher  factorization  theorem.  The  factorization  theorem  is  a 
necessary  and  sufficient  condition  for  the  sufficiency  of  a  statistic.  The 
essence  of  the  criterion  is  that  the  joint  density  of  the  data  Z_  be  factor¬ 
able  into  two  nonnegative  functions,  one  of  which  does  not  depend  on  the  un¬ 
known  parameter,  and  the  other  which  may  involve  the  unknown  parameter,  but 
which  depends  on  _Z  only  through  the  sufficient  statistic.  The  priors  _Z 
are  distributed  according  to  a  joint  complex  normal  distribution 


p(z)  =  h'n|d 


-1  -Z^D*1? 
e  —  —  — ' 


where  the  elements  of  the  covariance  matrix  D  are 


(87  j 


(88  j 
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The  DFT  coefficients  from  different  sampling  intervals  are  statis¬ 
tically  independent  if  frequency  band  hopping  is  used  from  one  observation 
interval  to  the  next.  They  are  approximately  independent,  depending  on  the 
signal  fading  rate,  in  a  system  where  no  hopping  is  used.  If  they  are  as¬ 
sumed  to  be  independent  and  identically  distributed,  D  is  a  diagonal  matrix 
with  identical  elements  on  the  main  diagonal 


and 


D  =  dl 


p(Z)  =  IT  d  e 


n=l 


The  statistic 

T 


•> 


(89) 


(90) 


(91) 


is  sufficient  since  p(Z_)  satisfies  the  factorication  theorem  for  T.  It 
depends  on  the  data  £  only  through  T.  By  the  factorication  theorem  it 
is  allowed  to  have  two  factors,  one  of  which  may  depend  on  d  but  depends 
on  £  only  through  T.  The  other  factor  may  depend  on  Z  in  any  way  but 
does  not  depend  on  d.  This  latter  factor  may  be  taken  as 

g(2)  =  1  (92) 

in  this  case. 


The  statistic  c  is  not  sufficient  since  the  weights  b  are 
unequal,  and  the  factorication  criterion  does  not  hold.  The  reason  for  using 
unequal  weights  is  that  our  confidence  that  the  parameter  d  remain  con¬ 
stant  diminishes  with  time  over  the  sample  of  priors.  Thus  the  weighted 
average  z  is  as  close  as  we  are  willing  to  come  to  a  sufficient  statistic. 


IV.  THE  ADAPTIVE  M-ARY  FSK  RECEIVER 


The  adaptive  M-ary  FSK  receiver  patterned  after  the  SPLOT  algorithm 
is  discussed  with  emphasis  on  deviations  from  the  ideal  algorithm  which  make 
the  receiver  computationally  practical.  A  receiver  with  a  full  dimension 
estimator  is  considered  here.  There  is  no  admixture  of  estimates  among 
different  DFT  coefficients.  Data  reduction  techniques  which  further  improve 
upon  the  computational  efficiency  of  the  receiver  are  taken  up  in  Section  S. 

ESTIMATING  FILTERS 

A  detailed  algorithm  for  the  adaptive  receiver  incorporating  data 
reduction  techniques  is  given  in  Appendix  F.  Here  is  a  discussion  of  a 
receiver  without  such  data  reduction  with  emphasis  on  some  features  that 
are  common  to  receivers  with  and  without  data  reduction.  The  receiver  with¬ 
out  data  reduction  has  been  experimentally  verified  to  be  equal  or  inferior 
in  performance  to  those  which  incorporate  some  smoothing  across  DFT  coeffi¬ 
cients.  The  full  dimensional  estimation  algorithm  given  here  is  significant 
as  a  step  in  the  evolution  of  a  practical  adaptive  receiver. 

Figure  8  is  a  block  diagram  of  the  adaptive  receiver  which  traces 
the  flow  of  computations  from  the  sample  DFT  coefficients 

15  -  j  ‘-’!Tkn 

2k,m  *  E0(xn,m+3Vm)e  N  k=0,...,15  (931 

to  the  receiver's  decision  variable  l  .  (m) .  In  the  double  index  notation 

mm 

(k,m)  the  index  of  frequency  is  k  and  the  index  of  the  observation  inter¬ 
val  (chip  index)  is  m.  The  index  of  time  samples  within  an  observation 
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interval  is  n.  The  complex-valued  combination  of  in-phase  and  quadrature 

baseband  waveforms  x  +jv  are  defined  by  Equation  45  of  Section  II.* 

n ,m  n ,m 

A  receiver  with  specific  dimensions  of  M=16  complex  samples  per  observation 
interval  and  M=8  alternative  signals  is  used  for  illustration. 


The  first  operation  on  the  complex-valued  samples  ^  from 

the  mch  observation  interval  is  to  convert  them  to  magnitude-squared 

coefficients  I '  P.  These  are  used  in  connection  with  the  current  esti- 
*  >  m 

mate  a  ^  of  the  inverse  spectrum  to  determine  the  receiver's  modula¬ 
tion  decision  by  computation  of  M  inner  products 


Pj_Cm) 


M-l 

Li 

k=0 


‘k  ,m 


l  =  ^-4,-3 


..,3}  04) 


and  the  selection  of  the  index  of  the  minimum  of  these 


Pz  <  (m) ,  l  =  {-4,-3,. .. ,3}  (95) 

min 


as  the  modulation  decision  8,  .  .  The  notation  jk-2.1  is  a  modulo  addition 

min  '  ’ 

which  means  that  the  estimates  are  shifted  in  a  circular  or  wraparound 
sense  to  form  the  M  different  inner  products.  The  second  index  is  m-l 
rather  than  m  since  the  estimate  must  necessarily  be  based  upon  the  ob¬ 
servations  leading  up  to  but  not  including  the  current  interval. 


In  the  range  of  rapid  signal  fading  for  which  the  optimal  (known 
statistics)  receiver  will  operate,  signal  energy  is  effectively  spread  no 
more  than  three  DFT  coefficients  from  the  center  of  the  spectrum.  There 
is  a  guard  band  of  four  DFT  coefficients  on  either  side  of  the  eight 
possible  center  frequencies  so  that  all  of  the  signal  energy  is  always  within 
the  baseband  in  the  useful  range  of  the  receiver.  This  circumstance  allows 

O' 

the  use  of  circular  shifts  of  the  estimated  inverse  spectrum  o  “  since 
only  the  flat  inverse  noise  level  is  wrapped  from  one  end  of  the  baseband 
to  the  other  by  that  operation.  Theoretically  a  23  element  estimate  of 
the  spectrum  would  be  required,  16  elements  for  the  signal  and  noise 


*  .An  impulse  sampler  is  assumed. 
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spectrum  at  an  extreme  deviation  frequency  and  7  additional  elements  for 
spectral  points  shifted  into  the  baseband  as  the  center  of  the  spectrum 
occupies  the  seven  remaining  locations.  Since  the  additional  elements 
required  are  estimates  of  noise  levels  only,  these  are  conveniently  taken 
from  the  opposite  end  of  the  spectrum  by  the  circular  shift  operation. 


The  same  circumstances  allow  the  estimates  of  the  signal  spectrum 
to  be  based  upon  spectra  centered  by  a  circular  shift  operation.  Using 
the  modulation  decision  at  the  mth  observation  interval,  the  samples 
|z^  m |  are  realigned  by  the  operation 


(96) 


When  the  modulation  decision  is  correct,  the  center  of  the  signal  spectrum 

i  i  7 

is  shifted  to  the  first  entry  of  the  realigned  set  tz'Q  t".  The  next 
higher  frequency  is  shifted  to  the  k=l  location  and  the  next  lower  frequency 
to  the  k=  1 5  location,  etc.  A  fraction  of  the  raw  samples  are  incorrectly 
aligned  when  modulation  level  errors  occur.  These  errors  have  proven  to  be 
inconsequential  to  the  receiver  performance. 


At  this  stage,  the  algorithms  incorporating  data  reduction  differ 
with  that  currently  being  described.  The  aligned  samples  are  subjected  to 
a  data  reduction  operation  before  insertion  into  averaging  filters.  For  the 
full  dimensional  estimate  at  hand,  each  of  the  sixteen  DFT  coefficients  is 
input  to  a  single-pole  recursive  filter: 


(1-K) 


(97) 


where  K<1  is  the  pole  location  of  the  filter.  The  average  coefficients 

1 z,  |  are  inverted  to  form  the  estimate  of  the  inverse  spectrum 

k  ,m 


(98) 
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The  block  diagram  shows  a  time  delay  T  following  the  inversion.  This  is 
a  convention  which  allows  a  computational  sequence  to  be  represented  in  a 
block  diagram.  The  time  delay  is  implicit  in  the  order  of  computations. 
Since  the  current  demodulation  decision  *s  used  to  realign  the 

current  samples  | the  estimate  of  the  spectrum  used  to  demodulate 
the  mth  sample  is  necessarily  based  only  on  prior  samples  up  to  the  m-l**1. 


V.  DATA  REDUCTION 


Several  stages  of  investigation  have  led  up  to  this  final  expository 
section  in  which  a  useful  receiver  is  introduced.  Let  us  review  these 
briefly  to  put  the  work  reported  here  in  perspective. 

The  objective  of  all  of  this  research  is  to  achieve  a  practical 
receiver  with  near  optimal  performance  for  rapid  signal  fading  conditions. 

The  first  object  of  study  was  the  optimal  receiver  for  demodulating  an  M-ary 
FSK  signal  in  rapid  signal  fading  and  additive  white  noise.  Although  it 
cannot  be  practically  implemented  due  to  the  incorporation  of  a  priori 
knowledge  of  the  received  signal  and  noise  spectrum,  the  performance  of  the 
optimal  receiver,  in  terms  of  probability  of  misclassification  error,  is 
useful  as  a  benchmark  for  the  evaluation  of  practical  adaptive  receivers. 

The  optimal  performance,  along  with  the  conventional  M-ary  FSK  receiver's 
performance  (in  rapid  signal  fading) ,  provide  a  band  of  acceptable  performance 
for  the  proposed  receiver.  Candidate  adaptive  receivers  should  significantly 
outperform  the  conventional  receiver,  or  the  simpler  conventional  algorithm 
would  be  preferred.  On  the  other  extreme,  the  proposed  adaptive  receiver 
will  necessarily  not  perform  as  well  as  the  optimal  receiver  but  should 
fall  near  the  optimal  extreme  of  the  acceptable  band  of  performance.  These 
two  bounds  are  used  for  evaluating  adaptive  receiver  performance  in  Section  VI 
where  experimental  results  are  discussed. 

As  an  offshoot  of  the  optimal  receiver  algorithm,  a  more  efficient 
suboptimal  algorithm  was  discussed.  The  so  called  SPLOT  algorithm  eliminates 
the  need  for  certain  matrix  inversions  of  the  optimal  receiver  by  using  the 
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approximation  that  off-diagonal  terms  of  the  matrix  may  be  set  to  zero 

without  noticeable  effect  on  the  receiver  performance.  This  result  has  been 
experimentally  established  for  the  channel  conditions  of  interest  and  is 
documented  in  Section  VI.  The  SPLOT  receiver  is  also  not  a  practical  receiver. 
It  incorporates  a  priori  information  about  the  received  signal  and  noise 
random  processes  in  the  same  way  as  the  optimal  algorithm. 

One  candidate  technique  for  an  adaptive  receiver  was  introduced 
in  Section  IV.  That  receiver  substitutes  a  spectral  estimate  for  the  a  priori 
known  spectrum  of  the  SPLOT  receiver.  The  spectral  estimate  is  formed  by 
averaging  the  spectra  over  prior  observation  intervals  with  the  use  of  single 
pole  recursive  filters.  It  is  implemented  with  a  decision  feedback  technique. 
The  current  modulation  decision  is  used  to  determine  which  sample  is  averaged 
with  the  center  of  the  spectrum,  next  higher  and  lower  frequency,  etc. 

Results  in  Section  VI  show  that  it  performs  near  optimally  with  an  averaging 
time  constant  of  about  40  prior  observation  intervals.  Some  drawbacks  of 
this  first  cut  adaptive  receiver  are  explained  here  and  two  improved  receivers 
are  introduced  which  use  a  reduced  dimensional  representation  of  the  received 
spectrum. 

REPARAMETERIZATION  OF  THE  SPECTRAL  ESTIMATE 

The  spectral  estimate  incorporated  in  the  first  proposed  adaptive 
receiver  (which  is  referred  to  here  as  a  16-parameter  estimate  since  the 
dimension  of  the  raw  spectral  estimate  is  16  in  our  simulations) ,  is 
cumbersome  from  a  computational  point  of  view.  Sixteen  independent  estimates 
are  formed  in  the  receiver,  and  each  of  these  requires  its  own  storage 
register  and  associated  arithmetic  operations.  A  means  of  reducing  this 
computational  load  while  not  further  degrading  the  receiver  performance  is 
discussed  here.  A  reduced  dimensional  parametric  spectral  representation 
is  used,  where  the  parameters  of  the  representation  are  estimated  rather 
than  the  raw  spectral  density  itself.  Depending  on  the  signal -to-noise  ratio, 


experimental  results  have  shown  either  an  improved  receiver  performance  or 
an  identical  performance  for  parametric  spectral  estimates  of  four  or  five 
dimensions  versus  the  original  16-dimensional  estimate. 

Only  two  parameters  enter  the  description  of  the  received  random 
process  which  is  input  to  the  simulations  of  receivers  evaluated  in  Section  VI. 
Thus,  a  reduction  to  two  dimensions  is  the  best  that  can  be  expected  from 
reparameterization.  One  of  these  parameters  describes  the  fading  rate  of 
the  process  and  the  other  gives  the  signal-to-noise  ratio.  Although  these 
two  parameters  are  certainly  the  most  efficient  representation  of  the 
spectrum  in  terms  of  dimension,  since  they  come  from  physically  independent 
sources,  they  are  not  useful  for  the  parametric  spectral  estimate.  One  of 
the  two  enters  the  functional  description  of  the  spectrum  in  a  nonlinear  manner. 
To  analyze  the  data  in  terms  of  these  parameters  requires  the  solution  of 
nonlinear  equations  (an  iterative  solution)  which  is  impractical  in  a  real¬ 
time  receiver. 

The  original  motivation  to  reduce  the  dimension  of  the  spectral 
estimate  was  not  just  to  improve  computational  efficiency  but  to  obtain 
improved  receiver  performance  from  a  more  accurate  s^ctral  estimate.  The 
raw  spectra  are  made  up  of  essentially  independent  random  variables,  and 
by  averaging  over  these  independent  samples  one  expects  to  reduce  the  varia¬ 
bility  of  the  estimate.  It  is  shown  in  this  section  that  an  improvement  in 
receiver  performance  with  reduced  dimensional  estimates  versus  the  16-param¬ 
eter  estimate  is  not  guaranteed  on  an  analytical  basis.  However,  experi¬ 
mental  results  reported  in  Section  VI  show  a  slight  improvement  due  to 
parametric  spectral  estimation  at  lower  signal-to-noise  ratios. 

Computational  savings  are  brought  about  with  parametric  spectral 
estimation  when  the  numerical  conversion  from  raw  spectra  to  the  reduced 
dimensional  representation  is  efficient,  and  when  the  conversion  can  precede 
averaging  over  prior  observation  intervals.  If  the  parameter  conversion 
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is  a  linear  operation  it  may  be  exchanged  in  order  with  the  average  over 
priors.  In  addition,  if  the  data  reduction  process  is  a  projection  operator, 
it  can  be  accomplished  efficiently.  The  investigation  of  data  reduction 
algorithms  is  therefore  confined  to  linear  projection  operators  in  the 
following  discussion. 


Parametric  spectral  estimation  implies  that  the  spectrum  will  be 
represented  in  a  functional  form  with  variable  parameters.  The  parameters 
of  the  spectrum  become  the  object  of  estimation  rather  than  the  spectral 
density  itself  over  frequency.  If  the  functional  form  is  linear  in  the 
parameters,  that  is  if  the  spectral  estimate  is  expressed  as 


T  * 

o"  =  £  ci  (") 

i=l 

where  the  set  of  are  the  estimated  parameters  and  x_^  are  a  set  of 

standard  functions,  the  parameterization  is  linear.  For  a  computationally 
efficient  analysis,  the  vectors  x^ ,  i  =  l,...P  should  form  an  orthonormal 
set.  When  the  x^  are  orthonormal^the  coefficients  c^  may  be  computed 
from  the  raw  estimate,  designated  a",  by  forming  p  inner  products 


and  then  computing  the  estimate  by  Equation  99. 


(100) 


The  composite  operation  of  computing  the  c. 's  and  reconstructing 

„  2  .  1 
a  spectrum  a  is 


?  f  T  ^ 

a  =  >  x.  x.  o 
—  — 1  — X  — s 

i=l 


(ion 
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which  may  be  expressed  in  vector-matrix  notation  as 


7 


(102) 


where  p 

*  =  E  *i  li  (103) 

i  =  l 

By  construction,  K  is  a  linear  projection  operator.  The  term  projection 
derives  from  the  fact  that  i(  projects  any  N-vector  into  the  subspace  spanned 
by  the  set  of  x^.  Our  next  concern  is  with  the  design  of  an  appropriate  set 
of  x.  vectors  to  use  for  data  reduction. 

—l 


The  approach  followed  here  to  the  design  of  this  orthonormal  set 
is  necessarily  less  than  ideal  since  the  sensitivity  function  of  the 
receiver  performance  to  errors  in  the  spectral  estimate  is  an  intractable 
integral  (see  Equation  43).  The  receiver  performance  is  only  available 
through  a  Monte  Carlo  integration  over  an  N-dimensional  space  requiring 
several  hundred  trials  per  integral.  To  design  an  optimal  orthonormal  set 
of  2L|'s  would  require  the  evaluation  of  the  second  cross  partial  deriva¬ 
tives  of  the  receiver  performance  as  a  function  of  deviations  in  the  spectral 
estimate  about  the  true  spectrum.  The  number  of  computations  involved  to 
obtain  these  sensitivities  to  a  useful  degree  of  accuracy  would  be  astronomical 
by  a  Monte  Carlo  procedure. 


In  lieu  of  obtaining  an  optimal  solution,  the  analysis  is  carried 
out  formally  as  if  the  sensitivity  function  were  available  and  a  deviation 
is  made  from  the  ideal  approach  which  leads  to  subopt imal  solutions  that  do 
not  involve  the  unavailable  sensitivity  functions.  In  the  course  of  the 
formal  analysis  it  is  found  that  the  performance  cost  can  be  broken  down 
into  two  components:  one  is  due  to  the  covariance  of  the  estimate  and  the 
other  is  due  to  bias  error  in  the  reparameterization.  An  orthonormal  set 
of  are  determined  that  are  guaranteed  to  result  in  almost  zero  bias 
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error,  with  a  considerably  reduced  dimension,  and  which  are  not  a  function 
of  the  unknown  sensitivity.  However,  the  resulting  projection  operator  £ 
is  not  necessarily  optimal  in  terms  of  the  total  covariance  and  modeling 
(bias)  error.  The  practical  advantage  of  this  suboptimal  solution  is 
demonstrated  experimentally  in  Section  VI. 

OPTIMIZATION  CRITERION 


The  probability  of  error  in  the  receiver  as  a  function  of  the 
actual  statistics  and  the  incorrectly  estimated  statistics  of  the  received 
random  process  is  used  as  a  cost  function  in  a  straightforward  optimization . 
The  problem  is  stated  formally  even  though  the  optimal  £  cannot  be 
determined  since  the  practical  methods  of  data  reduction  are  derived  as  an 
offshoot  of  the  ideal  approach.  The  following  development  flows  from  the 
definition  of  cost  in  terms  of  misclassif ication  of  samples  using  a  particular 
spectral  estimate,  to  the  derivation  of  an  approximate  average  cost  or  risk 
over  an  ensemble  of  spectral  estimates.  A  Taylor  series  expansion  of  the 
cost  function  about  the  actual  statistics  is  used  in  developing  the  risk, 
and  terms  of  greater  than  third  order  are  neglected. 


In  Section  II  an  expression  for  the  probability  of  error  in  terms 

of  an  arbitrary  receiver  matrix  £  and  the  optimal  receiver  matrix  £  was 

X.  h 

introduced.  It  gives  the  probability  of  a  correct  decision  on  the  i 
hypothesis  of  transmitted  signal 


//  /  'ill'1® 


-1 


9>  n  V. 


dq1dq2. . .dqN 


(104) 


The  reader  is  referred  to  Section  II  for  the  definitions  of  the  various 
symbols  in  Equation  104.  It  suffices  here  to  note  that  the  expression 
relates  both  the  T  and  T  matrices  to  the  receiver  performance. 
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.Analysis  upon  this  expression  is  only  carried  out  formally  since  it  is  an 
intractable  integral. 


The  average  error  over  M  hypotheses  is 
M 

?(£.&  =  j*  l7c/i(LD  (105) 

i=l 

A  loss  function  may  be  defined  as  the  difference  in  the  probability  of  a 
correct  decision  when  using  F  versus  _F  in  the  receiver 

i^cr.n  =  Pc(r,n  -  Pccr,n  ao6) 

and  the  risk  function  is  the  expected  value  of  the  loss  over  the  ensemble 
of  random  estimates  £=£. 

«#cr,K)  =  E{i?(r,r(K))}  (107) 

The  risk  function  is  dependent  on  the  way  that  the  estimate  is  formed.  Here 
the  risk  is  symbolically  given  as  a  function  of  K  the  projection  operator 

/s 

used,  where  _T(Kj  is  the  random  estimate  derived  through  a  given  projection 

K. 


The  loss  function  is  expanded  in  a  Taylor  series  about 

T  =  T  reflecting  an  interest  in  the  behavior  of  the  receiver  only  in  the 
neighborhood  of  the  optimal  point  T.  The  elements  of  T  and  F  are 
inverse  spectra  but,  in  the  interests  of  linearity,  the  spectral  estimates 
will  be  made  on  noninverted  spectra  and  the  resulting  estimates  will  be 
inverted  for  use  in  the  receiver.  The  partial  derivatives  are,  accordingly, 
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taken  with  respect  to  the  inverses  of  elements  of  £  (noninverted  spectral 
points).  Expanding  the  loss  function: 


tf(£,D  =  PJ£,£)  -  Pccr,n 

3P  !  , 

•-£  — T  crI  j  -  ri 

i  ;  ar. 

X*J  r  =  r 


-  £  £  — -  ri* V (~k\  ‘  rk  V 


i , j  k , i  ar.\ar  *  ~  ^ 

i  ,1  k,&p=r 


Since  £=£  is  optimal,  the  first  partial  terms  must  go  to  cero. 
Terms  of  third  and  higher  order  are  neglected.  Then  the  risk  function  is 

rs a 

approximately  equal  to  the  expectation  of  the  second  order  term  with  £ 

A 

replaced  by  f. 


<*££.] 9  =  -  £  £  — ,  - 

i, j  k,a  arTVar,  , 

J  i.J  M  r 


With  this  approximat ion  the  risk  is  formulated  as  the  sum  of 

A 

covariance  terms  for  the  estimate  £  weighted  by  coefficients  from  the 
second  partial  derivative  of  the  channel  performance  function  P  (£,  T). 


There  are  some  practical  considerations  related  to  the  particulars 
of  sample  rate,  number  of  transmitted  frequencies,  tone  separation,  and 
consequent  degree  of  frequency  dispersion  over  which  the  receiver  can  be 
expected  to  operate,  which  allow  the  spectral  estimate  to  be  made  on  an 
N-dimensional  spectrum  where  N  is  the  number  of  samples  per  observation 
interval.  This  is  not  necessarily  true  in  general.  There  3re,  in  our 
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simulations,  N =  16  spectral  points  in  each  sample  spectrum,  but  modulation 
of  the  center  of  the  spectrum  over  M=8  locations  brings  the  total  number 
of  spectral  points  that  the  receiver  must  estimate  up  to  23.  Only  ten  of 
these  spectral  points  are  always  available  to  sample,  the  others  are  some¬ 
times  modulated  outside  the  skirts  of  the  baseband  filter.  It  happens  that 
the  signal  portion  of  the  received  spectrum  is  always  within  these  ten  points 
if  the  fading  rate  is  less  than  the  cutoff  rate  for  the  optimal  receiver. 
These  circumstances  allow  the  receiver's  internal  estimate  of  the  spectrum 
to  be  conveniently  made  on  an  N=  16  dimensional  basis  with  the  M=  8 
alternative  spectra  generated  by  rotational  shifts  of  one  spectral  estimate. 
(.There  is  no  problem  of  wrap-around  of  the  signal  portion  of  the  spectrum 
with  these  dimensions.  Only  the  flat  noise  portion  of  the  spectrum  is 
wrapped  around  by  the  rotational  shift.) 


In  terms  of  the  7_  matrix  these  16  points  may  be  considered  to 
come  from  either  of  its  two  centermost  rows.  It  is  convenient  to  use  a 
vector  notation  for  the  spectrum  with 


(110) 

(111) 


then  the  risk  may  be  expressed  as  the  trace  of  the  product  of  two  \  * N 
matrices 


(112) 


(113) 
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r 

[M]..  =  E  {(or 


2  ^  2  .  , 
°i  ^  (oi  '  aj  ^ 


i  =  1 . N 

j  = 


or,  in  vector  notation 


^ y-  T 

M  =  E{(o  -  a") Co  -  a")} 

where  a“  is  the  true  spectrum  and  o“  is  the  spectral  estimate  used  in 

the  receiver.  The  full  dimensional  estimate  (K =  I)  will  be  designated 

~  —  sy- 

a~  to  distinguish  it  from  reduced  dimensional  estimate  a“.  results 

from  a  point-by-point  single  pole  recursive  average  over  prior  spectra  and 
is  (to  within  a  negligible  adjustment  factor)  an  unbiased  estimate.  The 
reduced  dimensional  estimates  generally  will  introduce  some  bias. 
N'otationallv , 


E(£s-  )  =  0- 
_■> 

E(0  =  o“ 


With  these  practical  considerations  introduced  and  various 
notations  established,  we  proceed  with  the  analysis. 

PROJECTION  OPERATOR 

In  a  practical  receiver  it  will  be  necessary  to  reduce  the  dimension 
of  the  spectral  estimate  (from  N  = 16  in  our  simulations)  to  a  much  smaller 
number,  say  three  to  five  parameters,  in  the  interest  of  computational 
efficiency.  An  improved  performance  may  also  be  achieved  but  it  is  not 
the  primary  objective  of  data  reduction.  A  linear  projection  operator 
represented  by  the  N  *  N  matrix  K,  such  that 


K  =  XX 
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and  the  remaining  term  is  the  contribution  to  the  error  from  bias  introduced 
in  reparameterization.  Note  that  for  j(=  I_,  corresponding  to  no  data  re¬ 
duction,  this  term  goes  to  zero.  The  linearized  risk  can  also  be  separated 
Into  these  two  terms 

JP(a2,K)  =  trfC^K  D  KT}  +  trfC^CK  -  I_)a2o2T(K  -  I_) >  (123) 

Note  that  the  risk  is  a  function  of  the  particular  fading  rate  and  signal- 

2 

to-noise  ratio  imbedded  in  a".  To  minimize  the  risk  as  it  stands  would 
result  in  a  K  which  is  a  function  of  the  unknown  statistic,  a  useless 

2 

result.  One  would  have  to  find  the  average  risk  over  an  ensemble  of  a 

2 

to  obtain  the  optimal  K.  Since  C  is  unavailable  at  anv  a  ,  it  is  not 

—  —a  — 

possible  to  carry  out  this  optimization.  However,  Equation  123  is  useful  in 
delineating  the  two  sources  of  degradation  in  receiver  performance,  and 
in  showing  the  linear  nature  of  the  mixture  to  a  second  order  approximation 
of  the  Taylor  series  expansion  of  the  cost  function  about  the  actual 
spectrum. 


As  an  alternative  to  computing  the  optimal  IK,  two  approaches 

to  derive  a  subopt imal  K  are  suggested.  In  each  of  these  methods,  only 

the  modeling  error  term  is  used  in  the  selection  of  K.  That  is,  reduced 

dimension  estimates  are  found  for  which  the  spectra  are  very  accurately 

2 

represented  over  an  ensemble  of  a  .  There  is  no  guarantee  that  the  reduction 
in  the  modeling  term  will  not  be  compensated  by  increases  in  the  covariance 
term,  but  experimental  results  show  that  the  receiver  performance  is  either 
improved  or  maintained  at  the  same  level  as  the  j(  =  £  case. 

One  of  these  techniques  uses  independent  estimates  of  spectral 
points  in  the  neighborhood  of  the  center  of  the  signal  portion  of  the 
received  spectrum.  The  more  remote  points  are  averaged  together  under  the 
assumption  that  they  are  drawn  from  the  same  noise-only  distribution.  The 
symmetry  of  the  spectrum  about  its  center  is  also  exploited  by  averaging 
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the  conjugate  points  on  each  side  of  the  center  within  the  expected  band  of 
signal  spreading.  This  technique  does  not  involve  any  assumptions  about  o“ 
other  than  the  band- limited  nature  of  the  received  signal  power,  symmetry 
about  the  center  point,  and  the  presence  of  additive  white  noise.  It  is 
referred  to  in  the  sequel  as  the  band-limited-symmetric-spectrum  [BLSS]  data 
reduction  algorithm. 

The  other  technique  is  based  upon  a  more  analytical  foundation. 

It  makes  the  selection  of  K  through  the  use  of  a  set  of  specific  examples 
2 

of  a  .  These  spectra  have  the  particular  functional  forms  and  range  of 
parameters  over  which  the  receiver  is  expected  to  operate.  This  latter 
technique  which  is  called  the  spectral  eigenvector  [SEV]  algorithm  is 
developed  next.  Results  for  both  methods  are  given  in  Section  VI. 

SPECTRAL  EIGENVECTOR  ALGORITHM 

The  BLSS  algorithm  is  an  obvious  technique  for  accurately  curve 
fitting  the  spectrum.  It  makes  use  of  the  most  salient  features  of  the 
received  data,  and  these  features  are  not  dependent  on  detailed  predictions 
of  the  ionospheric  channel.  This  simple  expedient  is  shown  in  Section  VI  to 
perform  equally  well  as  the  ]C  =  I_  case  for  high  signal-to-noise  ratio  and 
slightly  better  for  low  ratios  with  a  projection  from  sixteen  to  five 
dimensions.  The  success  of  this  heuristic  technique  motivated  further 
investigation  of  the  potential  of  a  spectral  eigenvector  method  which 
involves  more  detailed  knowledge  of  the  received  spectrum. 

To  this  end  we  focus  our  attention  on  the  modeling  error  portion 
of  the  risk 


<J?m (a2 , JC)  =  tr{C,(K  -  na2£2T(K  -  I_)} 


(124) 
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A  projection  matrix  £  is  sought  which  is  of  low  rank  and  causes  3t  (£  , jO 

2  m 

to  be  close  to  zero  over  the  predicted  ensemble  of  a  independent  of  C  . 

2  2T  ~ ^ 

Then  K- I_  must  be  effectively  in  the  null  space  of  o“a  over  the  range 
2 

of  a  .  In  the  following  discussion  it  is  shown  that  such  a  K-  I  can  be 
obtained  by  a  numerical  evaluation  of  eigenvectors  and  eigenvalues  of  the 
matrix 


v-  2  2 

S  =  V  o.“  o . 
M  -i  -i 


which  is  simply  a  sum  of  dyads  made  up  from  samples  of  predicted  a  's  over 
the  predicted  operating  range  of  the  receiver.  The  matrix  has  been 

experimentally  shown  to  have  a  rank  of  four  for  the  conditions  of  interest, 
and  that  rank  corresponds  to  the  rank  of  K  which  will  drive  the  modeling 
error  to  zero. 


These  results  follow  from  the  fact  that  vectors  in  the  null  space 

2  2  T 

of  are  necessarily  also  in  the  null  space  of  each  of  the  a.  .  To 
show  this  let  us  make  the  following  definitions.  The  N*N  matrix  1C  is 
symmetric  since  it  can  be  written  as  a  sum  of  dyads  (Equation  105).  If  K 
is  of  rank  P  it  may  be  factored  into  the  product 

K  =  XXT  (126) 

where  X_  is  an  N  *  P  matrix.  The  columns  of  X  are  the  orthonormal  set 
of  x^  referred  to  earlier.  If  the  orthonormal  set  is  augmented  to  a 
complete  N -dimensional  set  by  a  set  of  vectors  y^,  the  columns  of  Y,  then 

X  XT  ♦  Y  YT  =  I_  (127) 

T 

and  Y^£  is  a  projection  operator  with  a  range  space  orthogonal  to  that 
T 

of  XX  and  which  fills  out  the  N-dimensional  space.  Now  the  expression 
for  the  modeling  error  can  be  rewritten 

7  ?T  T  T  2  2T  T 

(£‘D2i  £i  (£'D  ’  II  £j  2i  II  •  (i:s) 
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T  T 

Consider  the  sum  of  J  such  terms  Y Y  S Y Y  .  If  the  columns  of 
Y  are  in  the  null  space  of  S,  the  trace  of  this  expression  is  zero. 

tr{YYTSYYT}  =  0  (129) 

By  the  properties  of  the  trace  and  the  orthogonality  of  the  columns  of  Y 

tr(YYTSYYT}  =  tr{YTSY} 


N-P  T 

-  £  y-1  m 

i=l  1 


(130) 


But,  by  expanding  S  it  is  evident  that  each  term  is  nonnegative. 


N-P  J 

£  £ 

i=i  j=i 


T  2  2T 
v.  a.  a.  v. 
-l  -j  -j  i 


N-P  J 

£  .£ 

i=l  j=l 


(y~ho2 


(131) 


Then  each  of  the  component  terms  must  be  zero 


j  =  1.....J 
i  =  1,. . .N-P 


(132) 


which  establishes  the  fact  that  the  columns  of  Y  are  in  the  null  space 
2  2T 

of  each  of  the  £j 


An  expanded  discussion  of  this  data  reduction  technique  is  given 
in  Appendix  G.  The  relationship  to  principal  component  analysis  in  sta¬ 
tistics  is  explored,  and  the  case  where  is  full  rank  is  discussed. 


69 


VI.  EXPERIMENTAL  RESULTS 


Receiver  performance  curves  for  all  of  the  receivers  under  study 
are  presented  in  this  section.  Included  are  the  performance  of  the  optimal 
receiver,  the  SPLOT  algorithm,  the  conventional  receiver,  16-parameter 
adaptive  receiver,  and  the  adaptive  receivers  using  the  BLSS  and  SEV  data 
reduction  techniques  introduced  in  Section  V. 

CONVENTIONAL  AND  OPTIMAL  RECEIVERS 

All  of  the  results  presented  here  are  for  an  uncoded  8-ary  FSK 
system  with  the  following  parameters: 

Modulation  Interval  T  =  5  ms 

Frequency  Separation  Af  =  200  Hz 

Sample  Rate  N/T  =  3200/s 

Number  of  Complex  Samples  Per  T  N  =  16 

Baseband  Width  B  =  1600  Hz 

The  performance  of  a  conventional  8-ary  FSK  receiver  in  rapid  signal  fading 
is  given  for  several  degrees  of  fading  rapidity  along  with  the  corresponding 
optimal  performance  and  that  for  the  SPLOT  algorithm  (which  are  indistinguish¬ 
able  in  all  of  the  cases  considered).  Two  types  of  functional  form  of  the 
signal  portion  of  the  received  random  process  are  used.  These  are  the 
Gaussian  signal  spectrum 

.  r(f*Af)Toi2 

*r'(f-fc>  =  Cle  L  W  J  (133) 
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and  the  cubic  roll-off  spectrum 


*r-<£-fc)  =  C2 


1 


[a+[(f^Af)To]2]3/2 


(134) 


“  2 

In  each  case,  the  decorrelation  time  t  is  the  e~*  point  on  the  autoeor- 

o 

relation  function  associated  with  the  signal  power  spectral  density  (f-f  ) . 
The  cubic  roll-off  spectrum  is  used  since  it  is  characteristic  of  the  results 
obtained  with  multiple  phase  screen  numerical  models  of  a  highly  ionized 
ionosphere.  The  Gaussian  spectrum  is  included  primarily  as  an  indicator  of 
the  validity  of  decorrelation  time  as  a  measure  of  the  difficulty  to  de¬ 
modulate  the  signal,  although  it  also  represents  the  ionospheric  channel  under 
some  extreme  conditions.  Graphs  of  the  two  spectra  are  shown  in  Figures  9 
and  10  to  illustrate  the  degree  of  overlap  expected  within  the  useful 
operating  range  of  the  optimal  receiver.  Curves  are  shown  for  a  range  of 
decorrelation  times  from  about  twice  the  observation  interval,  where  there 
is  essentially  no  overlap  of  the  spectra,  to  one-fifth  of  the  observation 
interval  where  the  signal  spectra  are  spread  well  into  the  adjacent  channels. 
Degradation  in  the  receiver  performance  comes  from  additional  errors  due  to 
the  spreading  of  signal  energy  into  nearby  channels.  Thus,  the  degree  of 
overlap  shown  in  the  figures  is  illustrative  of  the  difficulty  in  demodulating 
signals  in  a  frequency  dispersive  channel.  It  is  not  intended,  by  the  in¬ 
troduction  of  these  curves,  to  embark  upon  a  discussion  of  comparative  per¬ 
formance  for  the  two  spectral  shapes.  Comparison  of  the  receiver  performance 
for  the  two  spectra  at  a  given  decorrelation  time  is  not  justified,  since 
decorrelation  time  is  an  arbitrary  measure  of  dispersion  for  the  spectrum, 
which  does  not  necessarily  reflect  the  difficulty  presented  to  the  receiver 
by  a  particular  functional  form.  The  Gaussian  spectrum  is  somewhat  more 
compact  than  the  cubic  roll-off  spectrum  at  the  same  decorrelation  time,  and 
therefore  the  performance  characteristics  are  correspondingly  better;  but 
this  does  not  imply  that  a  Gaussian  spectrum  presents  less  difficulty  to 
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Frequency 


Figure  9.  Gaussian  signal  power  spectral  density  -  various  decorrelation 
times,  normalized  signal  power,  Af  =  l/T  Hz. 
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Figure  10.  Cubic  roll-off  power  spectral  density  -  various  decorrelation 
times,  normalized  signal  power,  Af  =  l/T  Hz. 


the  receiver.  One  should  rather  look  at  the  similarity  of  the  response 
curves  as  an  indicator  that  decorrelation  time  is  a  useful  standard  measure 
of  signal  dispersion. 

Figures  11  and  12  are  the  receiver  operating  characteristic  curves 
for  the  conventional  and  optimal  receivers  in  terms  of  normalized  error 
(with  0.5  representing  random  performance)  as  a  function  of  the  mean  bit 
energy- to-noise  density.  The  ratio  shown  is  1/3  of  the  SNR  defined  by 
Equation  44  of  Section  2  to  reflect  the  fact  that  three  uncoded  bits  are 
transmitted  during  each  observation  interval  T. 

Receiver  operating  characteristic  curves  are  shown  for  the  same 
five  decorrelation  times  illustrated  in  Figures  9  and  10.  In  addition,  the 
curve  of  the  slow  fading  limit,  which  was  determined  analytically,  is  given 
as  a  means  of  comparison  of  rapid  fading  performance  to  the  corresponding 
performance  with  independent  Rayleigh  fading. 

The  slow  fading  limit  is  derived  under  the  assumption  that  the 
signal  is  constant  across  each  observation  interval  with  the  variation  from 
interval  to  interval  described  by  a  Rayleigh  distribution.  The  signal 
envelopes  are  considered  to  vary  independently  from  one  observation  interval 
to  another  in  the  slow  fading  derivation. 

The  convergence  of  the  Monte  Carlo  sampling  procedure,  used  to 
obtain  the  data  points  in  these  figures,  depends  upon  the  probability  of 
error.  To  account  for  this  more  samples  are  used  at  the  higher  signal-to- 
noise  ratios  where  the  error  probability  is  lower.  For  data  points  falling 
above  a  binary  probability  of  error  of  seven  percent,  800  trials  were  used, 
that  is  an  average  of  100  trials  per  channel  with  random  selection  of  channels. 
To  estimate  lower  probabilities  2400  trials  were  used,  or  an  average  of  300 
trials  per  channel. 
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Binary  Error  Probability 
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Figure  11.  8-ary  frequency  shift  keying  performance  in  fast  Rayleigh 
fading  channel  (Gaussian  signal  spectrum). 
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Results  show  a  marked  improvement  for  the  performance  of  the 
optimal  receiver  over  the  conventional  receiver.  At  the  two  shorter  decor¬ 
relation  times  t  =1.10  ms  and  t  =1.60  ms  the  conventional  receiver  makes 
o  o 

an  error  in  about  one  out  of  every  three  trials,  while  the  optimal  receiver 
is  in  error  only  one  out  of  ten  to  twenty  trials  in  the  mid  range  of  signal- 
to-noise  ratio.  The  performance  of  the  two  receivers  converges  at  lower 
signal-to-noise  ratios  and  at  longer  decorrelation  times. 

In  every  case  shown  in  these  figures  the  SPLOT  algorithm  was  sta¬ 
tistically  indistinguishable  from  the  optimal  performance,  and  is  therefore 
not  given  in  a  separate  set  of  curves.  The  significant  improvement  in  per¬ 
formance  of  the  SPLOT  algorithm  over  the  conventional  receiver  was  the  moti¬ 
vation  to  pursue  the  design  of  adaptive  receivers  patterned  after  it. 

.An  alternative  way  of  viewing  the  data  which  highlights  the  degree 
of  operation  into  a  rapidly  fading  environment  is  given  by  Figure  13. 

Here  the  signal-to-noise  ratio  required  to  obtain  a  given  binary 
probability  of  error  is  graphed  as  a  function  of  tq  for  the  cubic  roll-off 
spectrum  only.  At  a  criterion  of  ten  percent  binary  errors  the  optimal  (and 
SPLOT)  receiver  are  functioning  at  fading  rates  two  to  three  times  more 
rapid  than  the  conventional  receiver.  At  three  percent  binary  errors  the 
cutoff  fading  rates  are  slower  but  the  ratio  remains  about  the  same. 

ADAPTIVE  RECEIVERS 

All  of  the  results  shown  for  adaptive  receivers  have  been  computed 
for  spectral  estimates  using  single  pole  recursive  averaging  filters  and 
decision  feedback.  In  Appendix  F  the  algorithms  used  in  these  receivers  are 
recorded  and  a  discussion  of  decision  feedback  is  given.  Only  the  results 
are  discussed  here. 
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error  rate  P.  at  various  decorrelation  times. 


The  first  adaptive  receiver  to  be  considered  is  one  using  a  full 
dimensional  estimate — the  so-called  sixteen  parameter  adaptive  receiver. 

Figure  14  shows  the  receiver  operating  characteristic  for  this  receiver  versus 
the  optimal  and  conventional  receiver  at  t  =3.20  ms,  an  intermediate  level 

•k 

of  signal  fading.  A  time  constant  of  40  prior  observation  intervals  is  used 
in  the  estimator.  The  adaptive  receiver  stays  within  20  percent  of  the 
optimal  performance  over  the  entire  range  of  signal -to-noise  ratio  and  is 
clearly  near  optimal  in  comparison  to  the  conventional  receiver  except  at 
the  lower  ratios.  Results  for  a  five-parameter  BLSS  algorithm  operating 
on  the  same  data  are  included.  The  BLSS  receiver  performance  coincides 
with  the  sixteen  parameter  receiver  at  high  signal -to-noise  ratios  but  shows 
a  slight  improvement  at  lower  levels.  Subsequent  data  indicate  that  the 
coincidence  at  high  signal  levels  is  due  to  a  compensation  of  performance 
degradation  from  modeling  error  and  improved  estimation  accuracy  from  smoothing 
in  the  frequency  dimension.  It  is  significant  that  the  reduced  dimensional 
estimate  maintains  the  performance  of  the  raw  estimate  at  a  considerable 
computational  savings.  Improvements  in  the  performance  are  slight. 

Next,  we  consider  performance  of  the  two  competing  algorithms  for 
data  reduction  of  the  spectral  estimates.  To  put  these  on  an  equal  footing, 
four  dimensional  spectral  estimates  are  used  in  each  case.  The  set  of  £“ 
used  in  computing  the  matrix  for  the  SEV  algorithm  are  listed  in  Table  1. 

The  criterion  to  select  these  was  based  upon  a  desire  to  maintain  the  per¬ 
formance  most  carefully  at  higher  error  rates.  The  decorrelation  times  and 

signal-to-noise  ratios  are  from  a  slice  across  Figure  12  at  a  binary  proba¬ 
bility  of  error  of  ten  percent.  This  level  of  errors  is  considered  about 

the  greatest  that  may  practically  be  handled  with  error  correction  codes 
for  this  system.  The  best  possible  fit  is  desired  in  the  region  of  the 
worst  tolerable  case  of  error  probability  since  degraded  performance  due  to 
modeling  error,  in  lower  regions  of  the  operating  characteristics  will  pre¬ 
sumably  be  well  within  the  capabilities  of  the  error  correction  code. 

*  Differences  in  conventional  and  optimal  receiver  performance  curves 
among  Figures  11  through  20  are  due  to  statistical  variation.  All 
curves  on  each  figure  are  based  on  the  same  statistical  sample. 
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Binary  Error  Probability 


Figure  14.  8-ary  FSK  receiver  performance.  Adaptive  vs.  optimal  and 
conventional  receivers. 
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Table  1.  Parameters  of  o  used  in  constructing  the  matrix 

xo  (ms)  Eb/No  (dB) 


9.1 

5 

7.5 

6 

6.0 

7 

4.9 

8 

4.0 

9 

3.2 

10 

1 .6 

15 

1.1 

23 

Figure  15  shows  the  four  eigenvectors  of  S_  with  nonzero  eigen¬ 
values  arranged  in  the  order  of  descending  eigenvalue.  For  comparison,  the 
four  basic  functions  of  a  four-dimensional  BLSS  algorithm  are  also  shown. 

N'ote  that  the  SEV  eigenvectors  seem  to  mimic  the  BLSS  vectors.  One  vector 
accentuates  the  center  of  the  spectrum.  Two  others  focus  on  the  next  to 
center  and  second  from  center  values,  and  one  is  evidently  arranged  to 
compute  the  noise  level.  The  performance  associated  with  the  two  algorithms 
is,  however,  remarkably  different. 

Figure  16  gives  the  receiver  operating  characteristic  for  the  four¬ 
dimensional  BLSS  and  SEV  algorithms  along  with  the  conventional  and  SPLOT 
receiver.  The  integrating  time  constant  foT  averaging  over  priors  is  40 
observation  intervals  for  each  of  the  adaptive  receivers.  The  performance 
of  the  SEV  algorithm  is  right  at  the  optimal  level,  but  the  BLSS  receiver 
has  about  20  percent  more  errors  in  the  mid  range  of  signal-to-noise  ratio. 
Notice  that  even  though  the  S_  matrix  was  made  up  from  spectra  along  the 
ten  percent  error  line,  performance  of  the  SEV  algorithm  is  on  the  optimal 
characteristic  throughout  the  range  of  signal-to-noise  ratio  including  much 
lower  error  probabil ities. 
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i)  Four-parameter  SEV  data  reduction  algorithm. 
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(b)  Four-parameter  BISS  data  reduction  algorithm. 

Figure  15.  Normalized  basis  vectors  for  four-parameter  SEV  and 
BLSS  data  reduction  algorithms. 
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Mean  Bit  Energy  to  Noise  Density  (dB) 


Figure  16.  8-ary  frequency  shift  keying  performance  in  fast  Rayleigh 
fading  channel  (cubic  roll-off  signal  spectrum).  Four- 
parameter  SEV  receiver  vs.  four-parameter  BLSS  receiver 


With  the  addition  of  one  more  parameter  the  BLSS  algorithm  can  be 
made  to  perform  almost  as  well  as  the  SEV  algorithm.  Figure  17  shows  a 
comparison  of  a  five -parameter  BLSS  algorithm  with  a  four-parameter  SEV 
algorithm  under  the  same  conditions  of  Figure  16.  Figures  18  and  19  show  that 
there  is  no  problem  with  curve  fitting  at  longer  decorrelation  times  which 
might  be  suspected  due  to  poor  curve  fitting  there.  The  performance  of  both 
adaptive  receivers  converges,  along  with  the  SPLOT  receiver,  to  the  slow 
fading  limit  of  the  conventional  8-ary  FSK  receiver. 

The  integrating  time  constant  of  40  prior  observation  intervals  was 
selected  arbitrarily,  but  it  is  evidently  sufficient  for  convergence  in  terms 
of  receiver  performance.  The  excellent  results  obtained  with  the  SEV  algorithm, 
at  that  time  constant,  is  ample  evidence  of  this.  To  see  how  short  an  obser¬ 
vation  interval  could  be  used,  several  runs  were  made  with  succeedingly  shorter 
intervals  until  the  performance  curve  started  to  break  upwards.  Figure  20 
is  a  graph  of  the  BLSS  performance  at  integrating  time  constants  of  40  T 
and  4T.  There  were  only  about  ten  percent  more  errors  for  an  order  of 
magnitude  less  smoothing.  Without  introducing  any  further  channel  disturbances 
such  as  jamming  interference,  it  would  be  possible  to  use  quite  short  inte¬ 
grating  times  in  the  spectral  estimate.  However,  jamming  considerations  are 
likely  to  prohibit  the  use  of  very  short  prior  averaging  times,  or  otherwise 
to  dictate  the  dynamics  of  the  spectral  estimator. 

CONCLUSIONS 

Adaptive  M-ary  FSK  demodulation  with  nearly  optimal  performance 
is  feasible  for  rapid  signal  fading  conditions.  Spectral  estimates  which 
use  decision  feedback  and  single  pole  averaging  filters  with  time  constants 
of  less  than  40  prior  observation  intervals  have  been  demonstrated  to  achieve 
optimal  or  nearly  optimal  performance.  The  advantage  over  the  conventional 
8-ary  FSK  receiver  is  that  the  adaptive  receiver  can  operate  at  a  factor 
of  two  to  three  more  rapid  signal  fading. 
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Figure  19.  8-ary  FSK  frequency  shift  keying  performance  in  fast  Rayleigh 

fading  channel  (cubic  roll-off  signal  spectrum).  Four-parameter 
SEV  receiver  vs.  five-parameter  BLSS  receiver  at  tq  =  9.10  ms. 
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Figure  20.  Comparative  performance  of  BLSS  algorithm  at  long  and 
short  prior  integrating  times. 


The  spectral  estimate  can  be  linearly  parameterized  to  four  or  five 
parameters  without  any  significant  degradation  in  performance.  Two  techniques 
of  data  reduction  were  investigated.  One  of  these,  the  band  limited  sym¬ 
metric  spectrum  (BLSS)  algorithm  uses  no  detailed  knowledge  of  the  signal 
portion  of  the  spectrum  except  that  it  is  limited  to  a  few  DFT  coefficients 
and  is  symmetric.  The  other  technique,  which  is  called  the  spectral  eigen¬ 
vector  (SEV)  algorithm,  uses  an  ensemble  of  predicted  spectra  for  the  received 
random  process  to  derive  basis  vectors  for  the  spectral  estimate.  The  SEV 
algorithm  with  four  basis  vectors  and  the  BLSS  algorithm  with  five  basis 
vectors  achieve  nearly  optimal  performance.  If  the  spectral  functional 
form  can  be  reliably  predicted,  the  SEV  algorithm  would  be  preferred,  whereas 
the  BLSS  method  is  applicable  to  a  larger  family  of  spectra  which  do  not  have 
to  be  predicted  in  detail. 

The  introduction  of  a  spectral  estimator  to  the  receiver  may  make 
it  more  vulnerable  to  other  sources  of  channel  disturbance.  For  instance, 
an  intentional  jamming  signal  may  be  directed  toward  upsetting  the  spectral 
estimate  as  well  as  the  demodulation  decision.  Incorporation  of  deliberate 
jamming  threats  in  the  channel  is  a  primary  direction  for  further  research 
in  this  area.  The  jamming  source  should  be  considered  a  part  of  the  channel 
as  is  the  scintillation  source,  and  the  fundamental  receiver  algorithm  for 
minimum  probability  of  error  should  be  rederived  for  a  channel  with  rapid 
signal  fading  and  deliberate  interference.  If  the  combined  threat  problem 
could  be  put  on  as  firm  a  theoretical  foundation  as  the  rapid  signal  fading 
case,  the  adaptive  receiver  would  be  firmly  established  as  the  preferred 
system  for  the  rapidly  fading  channel.  As  it  is  today,  there  remains  a 
need  to  evaluate  the  performance  of  the  adaptive  receiver  against  the 
combined  scintillation  and  jamming  threat. 
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APPENDIX  A 


PROPERTIES  OF  THE  IN-PHASE  AND  QUADRATURE 
COMPONENTS  OF  A  NARROWBAND  GAUSSIAN  PROCESS 


The  received  signal-plus-noise  may  be  expressed  as  a  narrow- 
band  random  process  referred  to  the  carrier  frequency  <jc : 

r(t)  =  s(t)  +  n(t) 

=  x(t)  cos  uict  -  y(t)  sin  coct  A.l 

In  Section  II  it  is  shown  how  the  in-phase  and  quadrature 
signals  x(t)  and  y(t)  may  be  extracted  from  r(t)  through  a  combination 
of  mixing  and  filtering.  In  order  that  there  be  an  efficient  representa¬ 
tion  of  the  probability  density  of  the  DFT  components  of  the  sampled  ver¬ 
sion  of  the  preenvelope  signal  x(t)  +  j y(t) ,  the  following  two  properties 
must  hold  for  x(t)  and  y(t) 


E{x(t)x(t-t) }  = 

E{y(t)y(t-t) } 

A. 2 

E{x(t)y(t-t) }  = 

-E{y(t)x(t-T) } 

A,  5 

That  these  properties  hold  for  the  narrowband  Gaussian  Process  r(t)  is  a 
standard  result  in  communication  theory,  which  is  repeated  here  for  the 
convenience  of  the  reader.  The  derivation  shown  here  follows  that  in 
Whalen17  rather  closely. 

The  Hilbert  Transform  of  r(t)  is 

r(t)  =  x(t)  sin  u>c(t)  +  y(t)  cos  u>ct  A. 4 
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Solving  Equations  A.l  and  A. 4  for  x(t)  and  y(t)  results  in 
x(t)  =  r(t)  cos  wct  +  r(t)  sin  0Jct 
y(t)  =  r(t)  cos  ojct  -  r(t)  sin  oJct 

Using  the  properties  of  the  Hilbert  Transform:* 


R  ~(x)  = 
rrv  J 

-  Rr(t) 

A. 5 

s — % 

H 

v_> 

II 

RrCx) 

A. 6 

Rj (t)  = 

RrCr) 

A. 7 

the  autocorrelation  of  x(t)  may  be  expressed 

Rx(t)  =  E{x(t)x(t-x) }  =  E{r(t)r(t-x) }  cos  io.t  cos  (t-x) 

+E{r (t)r(t-x)  }  sin  oj  t  sin  oJc(t-x) 

+E{r(t)r(t-x)  >  cos  oj^t  sin  oic(t-t) 

+E{r(t)r(t-x) }  sin  o)ct  sin  ojc(t-x) 

=  R  (x)  cos  oj  t  cos  oj  (t-x)  +  R,*  fx)  sin  oj  t  cos  u»  [t-x) 
r  c  c  -  rr  c  cl 

+R  ~(x)  cos  (a  t  sin  oj  (t-x)  +  R~(x)  sin  oj  t  sin  ojc(t-x) 
rr  c  c  r  c  c 

Using  Equations  A. 5,  A. 6,  and  A. 7, 

R  (x)  =  R  (x)cos  oj  tcos  oj  (t-x)  +  R  (x)  sin  o>  t>  cos  u>  (t-x) 

x  r  c  c  r  c  c 

-R  (x)  cos  oj  tsin  oj  (t-x)  +  R  (x)  sin  oj  t  sin  oj  (t-x) 
r  c  c  r  c  c 

=  R  (x)  (cos  oj  t  cos  oj  (t-x)  +  sin  oj  t  sin  oj  (t-x)J 
r  c  o  c  c 

A 

-R  (x)  [cos  oj  t  sin  oj  (t-x)  -  sin  oj.  t  cos  oj  (t-x)] 
r  L  c  c  c  c  J 

R  (x)  =  R  (x)  cos  oj  x  ♦  R  (x)  sin  oj  x  A.  8 

xv  r  c  r  c 

*  The  HilbeTt  transform  is  g(t)  =  w~mgof  g(t-x)x  idx.  Note  that  it 
exists  for  sample  functions  of  a  stationary  random  process  unlike 
the  Fourier  transform,  and  is  also  well  defined  for  auto-  and  cross 
correlation  functions. 
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It  may  be  shown  by  a  similar  development  that 

/v 

Ry(T)  =  RfCt)  COS  WC(T)  +  Rj.(t)  sin  Wet 

so  that 

MT)  « 

which  is  the  property  stated  in  Equation  A. 2. 

Next 

Rxy(r)  =  E(xCt)  y  (t-r)} 

=  £{r(t)  r  (t-T)}  cos  a>ct  cos  wc(t-x) 

+E{f(t)  r  (t-t)}  sin  <JJct  COS  U)  (t-t) 

-£{r(t)  r  (t-t)}  cos  uct  sin  wc(t-t) 

-E{ r(t)  r  (t-r)}  sin  wct  sin  u  (t-x) 

=  R  *{t)  cos  t  cos  co  (t-T) 
rr^  c  cv  J 

sin  to  t  cos  u>  ft-T) 
r  c  c 

-R  (T)  cos  w  t  sin  to  Ct-x) 
r  c  c 

-R-'  sin  oj  t  sin  to  (t-r) 
rr  c  c 

Using  Equations  A. 5,  A. 6,  and  A. 7 


R  (x)  =  R  (x)  [sin  co  t  cos  co  (t-t)  -  cos  co,t  sin  w  (t-t)] 
X/  r  c  c  c  c 

/s 

-Rr (x)  [cos  0)ct  cos  oic(t-r)  +  sin  w,  sin  <oc(t-x)] 

R  (t)  =  R  (t)  sin  to  t  -  R~(t)  cos  to  t 
xy  r  c  r  c 


A  parallel  analysis  results  in 

R  (t)  *  -R  ("0 
xy  yx 

which  is  the  property  stated  in  Equation  A. 3. 


A. 9 


A.  10 


A. 11 


A.  12 
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If  the  noise  term  n(t)  in  Equation  A.l  is  white  (uncorrelated)  then  Equation 
A. 8  may  be  rewritten 


N  B 

*x(T)  3  Rg(T)  cos  tocT  +  Rs(t)  sin  U.T  +  ~~  5(t) 


N  B 

-  K  (T)  +  -4-  o  (t ) 
xs 


A.  13 


where  Nq  is  the  noise  power  spectral  density  of  n(t)  in  watts/Hz,  B 

is  the  noise  bandwidth  of  the  baseband  analog  filter,  and  RXs(t)  i-s  t^e 

signal  component  of  the  baseband  autocorrelation  R  (t) .  Also  from  A.  11 

X 

/\ 

R  (t)  =  R  (t)  sin  oj  t  -  R  (t)  cos  m  t  A.  14 

xy  ‘  sv  1  c  s  c 

The  white  noise  does  not  contribute  to  the  crosscorrelation  of  the  baseband 
waveforms  x(t)  and  y(t). 
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APPENDIX  B 

THE  PROBABILITY  DISTRIBUTION  OF  THE  DFT 
OF  A  SAMPLED  COMPLEX  GAUSSIAN  PROCESS 


The  development  of  Appendix  A  shows  that  the  baseband  in-phase 
and  quadrature  waveforms  have  certain  statistical  regularities  given  by 
Equations  A. 2  and  A. 3.  If  an  impulse  sampler  is  assumed,  the  corresponding 
properties  of  the  sampled  waveforms  are 


E 


B.l 


E 


-E 


3 . 2 


In  order  to  express  the  probability  distribution  of  the  DFT  coefficients  of 

x  +  jy  in  an  efficient  manner,  it  is  necessary  that  a  similar  property 
n  n 

hold  in  the  frequency  domain.  Let 

1  N_1 

2k  '  \  *  jYk  *  if  £ 

n=0 


(\  - 


k=0,...,N-l  B.5 


with 

sequence 


and  Yk 


real  valued,  be  the  set  of  DFT  coefficients  of  the 


x  +  jy  n  =  0,1,2, ...,N-1 

n  n 


Then  the  desired  properties  are 


E  KM  ■  E  KM 
E  K M  ■  -E  K M 


B.  4 


B.5 
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Proceeding  directly  from  Equation  8.3, 


(  mi  -  27rki 

=  I!  1  x"  >V  !"(t)] 


B.  6 


)  i  N-i 

Yk  *  r«  N  £0  (\  *  »n} 


2-rrkn 


N-l 

-Z 

n=0 


2trkn  2irkn 

-  X  Sin  — rr—  *  y  cos  — rr— 

n  N  n  N 


B.' 


N-l  N-l  r 

£  k  aij  =  I  S  e 

'  **>  n=0  m=0  l 


+  E 


+  E 


+  E 


N-l  N-l 

L  Z 

n=0  m*0 


+  E 


|  /2;rkn  \ 

xnV|  cos(— )  cos(— ) 

,,  ,  )  .  /2iTkn\  .  /2x&n\ 

Yn  ll  |  5lnl— 1  slnl~H 

Ynxij  “"(t) 

“(t*)  “ffl] 


xn  xl  |  cos 


f27T(kn-lgQ 
l  N 


y  Xp  sin 
n  X 


(2  (kn-£m)\ 

\  N  / 
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100 


N-l  N-l 


EjykYt|=i:  E  f  E  5  xn  ^  !  sin(^)  sin('Zr£) 

'  n=0  tn=0 


+  E  !yn  y£  !  cosRr)  C0S(^T^) 


E  j  yn  \  C0S 


/2Ttkn\  .11 Tttah 
\  M  /  sin\  N  / 


E  |  *„  h  i  sin(^r^)  cos 


(l^)J 


N-l  N-l 


'EE  f  E  1  x  x  I  cos(3Ui2^Sl) 

n=0  m=0  I  n  £  1  V  N  / 


+  £ 


r,*ti  -PW 
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which  proves  Equation  B.4,  and 


N-l  N-l 

E  !\  ^!  *£  £ 


n=0  m=0 


„  {  I  2^kn  . 

E  K  Xm  !  cos  ~  sin  " 


N 


„  (  t  .  xTTK 

+  E  >  y  y  sin  — rr 
I  7  n  7  m  N 


.  2iTkn  2Tt£m 


cos 


-  E 


KM 


2irkn 

sin  — 


sin 


littm 


1  V  v  1 

2rrkn 

2rt£m 

r*  r\  c  - 

{  n  y®  \ 

COS  -j- 

cos  ^ 
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N-l  N-l  f 

•SE 

n=0  m=*0 


si,  .1  .in^Cg-ai) 

I  n  m  I  V  N  / 


E  '  y  x  '  cos( 

|  n  m  |  \  N 


2Tr(kn-£m) 


)] 
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N-l  N-l 


♦  E  j 

i  V 

r n 

y4 

cos  1 

(t) 

sin^ 

+  C 

iv 

x  l 

I'rtC  1 

f2*kn\ 

■T’  £ 

lyn 

•m  f 

wUh  1 

l  N  / 

IT) 


-  E 


<  >  .  /2TTkn\ 

{ xn  'm  }  nV  N  /  (  N  ) 


N-l  N-l 


■IShjv.l  .*(2^) 

n=0  m=0L  1 


+  E 


W*.} 


B.  11 


which  proves  Equation  B.5. 


These  properties  allow  the  covariance  matrix  of  combined  real 
and  imaginary  parts  of  the  DFT  coefficients  to  be  written  in  the  form 
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where 


’°Si-l,Y0’ 


B.  12 


B .  13 


and 


A  =  A‘ 


B  =  -  B 


B.  14a 
B.14b 


Wooding43  has  shown  that  if  the  covariance  matrix  of  a  collection  of  zero 
mean  Gaussian  random  variables  a  and  y  can  be  writtin  in  this  form,  then 
the  probability  density  of  the  complex  variates  z_  =  a  *  jy  may  be  expressed 
as 


T*  -1 

,  ,  -Ni . ,-1  -z  L  : 

p(z)  =  u  !  JL I  e  “  - 


where  L  is  a  Hermetian  covariance  matrix  for  t_. 

His  development  follows  from  the  decomposition 


B.15 


-1 


'a 

B  ‘ 

—  X 

T 

3 

B 

A 

B 

“ “  . 

L 

'  A  -B  " 

T 

-B  A 

. 

VA  +  BA'^i)  0 

0  (A  +  BA_1B j 

(a  *  BA’1^)  1  ’(A  +  M_1l)  1  BA  1 

-(±+  ba^b}*1  bV1  (a  +  ba._1b)  1 


r  A*1 


,-l 


B.16 


B.  17 
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which  is  of  the  same  form  as  Equation  B. 12  since 


and 


/  -1  /  T  T  -IT  Tv-1 

A  +  BA  1  )  =  (  A  +  b‘a  AiB  \ 

=  ^  A  +  BA"1?.)  1 

|  (a  +  BA-1^'1  BA'1]  T  =  A'1TiT(A  ♦  M_1l)  1 

=  a'xb(a  +  ba_1b)  1 
(a  +  ba*1!)  b_1a 


-1 


AB  A  f  B 


-1 


=  -  j  AB-^A  +  BA'XB  j]’1 

*  -  (a  +  ba_1b)  1  ba'1 


Then  one  may  write 


A  B 


BT  A 


-1 


rL  a 

aT  t 


where 


P_  -  ^A  +  BA_1b) 

£  =  -  (  A  +  BA-1b)  1  BA'1 


B.  18 


•  B.  19 

B.20 

B.  21a 

B.  21b 
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with 
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or 


|A|"1/2  =  | A  ±  jB|_1 


Then  the  distribution  or  Equation  B.24  may  be  rewritten 

-i/2  (a  -  jy)T(A  -  jB)*1 
f(£)  *  (2ir)  |A  -  jBj  1  e  ~ 


but 


/  ,  ,T*  l  .  p  /  T  .  T  .  T  T> 

\  f  E  +  3 yu  -  joy  +  yy  I 


2  tE  l2S.T  '  3'UYT{  J 


-  2  [A  -  jBJ 


then 


Ikl  =  !l{  ££T  J>  J  *  2N)A  -  jBJ 
and  the  probability  density  becomes 


.T*. -1 


f(z)  =  tt'N  jLj'1  e*-  ^  1 
or  equivalently  as 


T  -1* 
n  l  2  L  z* 

£(2)  *  it  1 1|  e 


which  is  occasionally  used  in  this  report. 


B.  28 

(a  +  jy) 
B.  29 

B.  30 

B.31 

B.32 
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APPENDIX  C 

COMPUTATION  OF  THE  COVARIANCE  MATRICES  L_-j  FROM  THE 
SIGNAL  AUTOCORRELATION  FUNCTION  AND  WHITE  NOISE  DENSITY 


The  covariance  of  the  complex  variates  z^  and  2£  is 


27rkn  .  2ir£m 


N-l  N-l 


N 


.  2-iT-un  ) 
3  N  ' 


(  C‘: 


N-l  N-l 

— r  T’  V*  E  )  x  x  +  y  y  +  j  v  x  -  i  x  y  1 
v,2  n  m  m  m  J  n  m  n  m 

N  n=0  m=0  ' 


.  2trkn 


-J- 


2-rr-Cm 


N 


Using  Equations  A. 10  and  A. 12  in  sampled  versions, 


N-l  N-l 


,  2-rrkn  .  27rdai 


E  j  Zk  Zl  ^  =  “T  £  ^  2(R  (n-m)  +  2Rvrv^"m0 
'  K  '  N  n=0  ra=0  V  x  / 


-J 


N 


3 


C.  3 


and  by  Equation  A. 13 


N-l  N-l  ,  -j  23S.  j 

x  (n-m)  -  jR^n-m)  j  e  '  e 


.  N-l  N-l  , 

e  2(r, 

N  n=0  m=0  \  ' 


N 


N  B 

*  -2-  5(k,£) 


C.3b 
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where  B  is  the  noise  bandwidth  of  the  baseband  analog  filter  preceding 
the  sampler.  (It  is  assumed  here  that  the  white  noise  term  n(t)  affects 
only  the  zero-lag  point  of  the  discrete-time  autocorrelation.) 

Equation  C.3  may  be  implemented  using  a  two  dimensional  DFT  with 
a  rearrangement  of  terms  to  accommodate  the  two  different  signs  on  the 
exponents.  The  two  dimensional  DFT  has  the  form 


N-l  N-l 

X(k,£)  *  -j  £  E  x(n»n0 
N  n=0  m=0 


.  2irk.n  .  2Tr£m 

N  'J  N 

e  e 


C.4 


To  obtain  Equation  C.3  the  columns  of  X(k,£)  between  £  =  1  and  £  -  N-l 
are  reversed  in  order,  giving 


N-l  N-l 

X(k,N-£)  S  x(n,m) 

N“n=0  m=0 


.  2mkn  .  2tt(N-£) 

"J  N  N 

e  e 


C.5 


The  integer  rotation  j2irN/N  may  be  dropped  from  the  expression  and 
Equation  C.3  results. 

Under  the  assumption  that  the  signal  autocorrelation  function  has 
the  same  envelope  for  any  hypothesis  of  signal  transmitted,  or  equivalently 
that  the  signal  PSD's  are  simple  translations  of  one  another 

N 

Rr  (t)  =  f(T)  cosui.T  +  -y  6(T)  ,  c.6 

ri 

where  is  the  transmitted  frequency  on  the  ith  hypothesis,  the 

various  are  closely  related  and  may  be  calculated  with  one  operation 

of  the  two  dimensional  DFT.  It  is  equivalent  to  assume  that  the  spectrum 
of  n(t)  is  symmetric  about  the  frequency  uii(t),  and  that  the  baseband 
lowpass  filter  is  wide  enough  to  pass  the  entire  signal  spectrum.  The 
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assumption  C.6  is  highly  restrictive  and  does  not  hold  where  the  signal 
doppler  spreading  is  a  significant  portion  of  the  width  of  the  baseband 
analog  filter.  The  technique  described  here  to  obtain  all  of  the  covariance 
matrices  as  a  rotationally  related  set  is  not  actually  used  in  perfor¬ 

mance  calculations  for  the  receiver,  but  is  merely  recorded  here  for  its 
theoretical  value  as  an  aid  to  understanding  the  relationships  among  the 
covariance  matrices. 


where  6  is  the  frequency  deviation  assumed  to  be  a  multiple  of  the 
sampling  frequency 
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and  all  of  the  may  be  obtained  from  one  DFT  operation  which  uses  only 

the  envelope  f(n)  of  the  autocorrelation  of  the  signal  s(t)  and  the 
noise  spectral  density.  The  alternate  L/s  are  obtained  by  a  rotating 
shift  along  the  main  diagonal. 
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APPENDIX  D 

THE  GENERATION  OF  COMPLEX  RANDOM  VARIATES 
WITH  THE  CORRELATION  MATRIX  L 


Starting  with  a  set  of  independent  Complex  Normal  Variates  £ 
with  the  distribution, 

f(S)  =  tt‘N  e"-T  I§  D.l 

it  will  be  straightforward  to  obtain  a  set  of  variates  with  the  desired 
covariance  L.  £  may  be  factored 

L  =  1/2 (A  -  jB)  =  1/2 (C  -  jD)(C  ♦  jD)  ,  D.2 

and  that  C  and  £  matrices  exist  to  satisfy  D.2  follows  from  the 
relationship 

A  =  C2  +  D 2  D.3 

B  =  DC  -  CD  D.4 


which  may  be  expressed  in  the  form 


D .  5 


D.6 


Ill 


The  square  root  of  A  exists  since,  as  the  covariance  matrix  of  Gaussian 
real  variables  it  is  positive  definite.  Then,  given  a  set  of  variables 
S,  they  are  transformed  by 

z  =  [C  -  jD]S  .  D.7 

The  resulting  variables  have  the  density 

PCD  -  *-N  |c  -  jDf1 

T*  -1  D'8 

-N  .-I  z  L  z 

-  tt  |Lj  e~  —  — 

where  |l|  is  the  Jacobian  of  the  transformation  from  S_  to  Z_. 
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APPENDIX  E 

PROPERTIES  OF  THE  COVARIANCE  MATRIX  WHICH  ARE 
USEFUL  IN  OBTAINING  INVERSES  AND  SQUARE  ROOTS 


The  covariance  matrices  are  complex  valued  in  general  and 
Hermetian.  To  obtain  the  optimal  receiver's  statistics  it  is  necessary 
to  invert  these  matrices,  and  to  obtain  sample  variates  it  is  necessary 
to  square  root  one  of  them.  The  phase  angles  of  the  elements  of 
these  matrices  are  unrelated  to  the  received  signal  spectrum  and  are 
an  artifact  of  the  sampling  process.  They  occur  in  an  orderly  fashion 
that  enables  a  simple  procedure  to  compute  inverses  and  square  roots 
based  upon  the  corresponding  inverses  and  square  roots  of  the  magnitude 
matrices.  The  phase  property  is  obtained  via  the  complex  conjugate 
of  Equation  C.3b  (the  noise  term  is  deleted  here  since  it  is  always 

real  and  affects  only  the  main  diagonal  of  the  covariance  matrix) 

.  2iTkn  .  27t£m 
J  N  N 

E'<Vl5  •  ^  E1  £  Rhs*("-m,e  •  E-1 

n=0  m=0 

R  .(n-m)  =  2 (R  (n-m)  +  jR  (n-m))  E.2 

z:s  Xs 

Since  R_^.  is  an  autocorrelation  function  of  a  complex  Gaussian  baseband 
signal,  it  is  Hermetian,  so  that  the  argument  (n-m)  may  be  reversed  and 
the  conjugate  sign  removed: 


E*{z 


B-l  N-l 

Z  Z  R..*(n-n)e 

*2  n=0  m»0  s 


2nkn  .  2iT£m 
N  N 


E  .3 
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Now  by  changing  variables  to 


n'  =  N  -  1  -  n 
m'  =  N  -  1  -  m 

the  expression  may  be  rewritten  as 


N-l  N-l 


L 


n 1 =0  m'=0 


R  .(n’-m')e 
22* 
s 


.  Z-rkCN-l-n1)  .  2rr& (N-l-m1 ) 
}  N  _J  N 


E.  5 


E*{ 


•7  *  *  T 

“k  r 


.  2TT(^-k) 

e  N  E(2k2j} 


This  is  a  simple  relationship  between  the  elements  of  the  covariance  matrix 
and  the  corresponding  complex  conjugates.  From  this  expression  one  may  obtain 
the  result  that  the  phase  angle  of  each  element  of  the  covariance  matrix  is 


e 


k’4 


ff(ic-a)  * 

N 


TT 


l  »  0,.. .,N-1 
k  =  0,. . . ,N-1 


E.6 


which  prescribes  the  phase  angle  structure  of  any  of  the  covariance  matrices 
l. .  The  angles  are  zero  on  the  main  diagonal,  and  the  matrix  can  be  arranged 
in  polar  form  so  that  angles  decrease  in  steps  of  2r/N  for  elements  to  the 
right  of  the  main  diagonal  and  increase  by  the  same  increments  for  elements 
to  the  left  of  the  main  diagonal.  In  this  form,  the  phase  angles  in  any  row 
or  column  range  over  a  phase  difference  of  it. 


Then  for  matrices  of  the  form 


f^k,)l  =  \,l 


TTCk-Z) 
J  N 


E.7 


where  ^  may  be  a  positive  or  negative  real  number,  if  the  inverse  of  the 
matrix  of  elements  ^  is  known  and  has  elements  y^  ^  such  that 
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N-l 

akn'r'n£  ”  \,2, 
n=0 

Then  the  inverse  of  has  elements 
,  tt (k-n 


.  ff(k-n)  .  T(n -l) 


n?0  “k-n' 


N  W 


A  similar  result  holds  for  the  square  root.  If  the  square  root  of 
matrix  of  elements  ^  is  known,  with  elements  Bj.  ^  such  that 

^-6kn8ni!.  =  Yk2,  E-10 

n 

then  the  square  root  of  L.  (if  it  exists)  is  a  matrix  of  elements 


.  TT(k-n)  .  tt (n-£) 

e  N  S  e  N 
®  Snle 


TT(k-il) 


N  E  B,  8 


kn  n  i 


a  lOs-A 


=  Yk,ie 


APPENDIX  F 


ALGORITHM  FOR  THE  ADAPTIVE  RECEIVER 
WITH  DATA  REDUCTION 


The  adaptive  receiver  algorithm  is  presented  here  in  detail  for 
the  reader  who  is  interested  in  constructing  a  simulation  or  a  real-time 
receiver.  To  avoid  the  notational  complexity  that  may  occur  with  arbitrary 
receiver  dimensions,  the  equations  are  given  in  terms  of  an  8-ary  system 
with  N *  16  complex  samples  per  observation  interval. 

A  double  index  notation  is  used  to  keep  track  of  the  observation 
intervals  and  functions  of  frequency  within  each  observation  interval.  The 
index  of  observation  intervals  is  m  and  the  frequency  index  is  k.  Where 
a  time  sample  index  is  required,  for  time  samples  within  an  observation 
interval,  the  variable  n  is  the  index  of  time  samples. 

Figure  F-l  is  a  block  diagram  of  an  adaptive  M-ary  FSK  receiver, 
illustrates  the  flow  of  computations  from  the  time  samples  to  the  modulation 
decision.  The  following  equations  give  the  explicit  operations  represented 
by  each  block  of  the  diagram.  Scalar  notation,  as  opposed  to  matrix-vector 
notation  is  used  wherever  it  is  possible,  for  the  convenience  of  the 
programmer. 


During  each  observation  interval  N=16  complex  valued  DFT  coef¬ 
ficients  are  computed  from  time  samples  of  the  in-phase  and  quadrature 

waveforms  x  and  y  ..  These  are  formed  into  the  complex  valued  function: 
n  'n  r 


x  + 
n  ,m 


jy, 


n  ,m 


n  =  0, ....  15  F. 1 


It 
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Figure  F-l.  Block  diagram  of  an  adaptive  receiver  with  data  reduction. 


for  a  given  observation  interval  m  and  transformed  to  DFT  coefficients: 


15 


'k  ,m 


•  £ 


(x  +jy  )e 
n,m  n,m 


j2fkn 

16 


k  =  0, . . . ,  15 


F .  2 


n=0 


The  first  operation  on  these  coefficients  is  to  compute  the  square 

2 

of  the  magnitude  of  each  |z,  |  ,  k=0,...  15.  These  values  are  tested 

k ,  m 

against  M=8  different  inverse  spectra  derived  from  the  estimates  a  “  ^  ^ 

in  the  operation  labeled  "decision  logic"  in  the  block  diagram.  The  estimates 

a  k  m  i  are  arranged  so  that  k  =  0  corresponds  to  the  center  of  the  signal 

portion  of  the  spectrum.  k=l  is  the  next  higher  frequency  estimate;  k  =  15 

is  the  next  to  center  on  the  lower  frequency  side,  and  so  forth.  (The  index 

m-1  indicates  that  the  estimate  is  based  on  modulation  intervals  prior  to 
t  h  t  h 

the  m  and  not  including  the  in  interval.)  The  modulation  decision  is 
the  index  £  .  (m)  of  the  smallest  inner  product  of  the  two  vectors  whose 

mi"  .7 

elements  are  z  "  and  circularly  shifted  versions  of  o  .  These 

K  y  m  K  y  HI 

inner  products  are: 

P£(n°  =  °  {k-£},m-l  ’  *'={'4’'3 . 3}  F-3 

and  l  .  is  given  bv 
min  & 


pa  (m)  s  p  (m)  ,  £=  (-4,-3 . 3}  F.4 

Siin  ^ 

where  {k-J.}  is  a  modulo  16  addition,  indicating  that  the  inverse  spectrum 
is  wrapped  around  by  the  shift  operation.  The  range  of  £  indicated  will 
shift  the  center  point  of  the  spectrum  into  the  eight  possible  locations 
where  the  center  frequency  of  jz,  j ^  is  expected. 


The  modulation  decision  £  .  is  used  to  align  the  |z,  ]“  for 

mm  k,m' 

insertion  into  the  estimator  of  the  signal  spectrum.  This  operation  is 
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labeled  "shift"  on  the  block  diagram.  The  realigned  version  of  the  spectrum 
is  designated  |z£  |“  where 

F .  5 

I  ,  | 2  _  .  ,2 

'  “k.nr  ”  '"{k+£  .  },m! 

mm 

When  the  modulation  decision  is  correct,  the  center  of  the  signal  spectrum 
is  shifted  to  the  first  entry  of  the  realigned  set  (z!j  m!~-  The  next  higher 
frequency  is  shifted  to  the  k  =  l  location;  the  next  lower  frequency  is 
shifted  to  the  k =  15  location  etc.  A  certain  fraction  of  the  spectra  will 
be  incorrectly  aligned  when  demodulation  errors  occur.  These  errors  have 
proven  to  be  inconsequential  to  the  receiver  performance  in  the  useful 
operating  region  of  the  receiver  operating  characterist ic  curves. 

The  realigned  data  |z,'  i  are  subiected  to  the  data  reduction 

k  ,m 

algorithm  which  allows  the  average  over  prior  observation  intervals  to  be 
made  with  only  a  few  storage  registers — four  or  five  for  the  receiver 
dimensions  and  conditions  of  interest  here.  Some  discussion  of  the  coef¬ 
ficients  of  the  data  reduction  algorithm  precedes  the  introduction  of  the 
data  reduction  equations. 


The  two  techniques  for  data  reduction  which  were  investigated  in 
Sections  V  and  VI  do  not  differ  in  terms  of  the  algorithm,  hut  only  in  a 
certain  set  of  coefficients.  These  are  16  x  P  in  number  and  are  most 
naturally  organized  as  the  coordinates  of  P  16  dimensional  vectors, 
where  P  =  4  or  P  =  5.  A  set  of  four  vectors  for  the  BLSS  algorithm  is 


x.  =  [100... 0]T  \ 

-  I 

x-  =  2  2  [0100. . . 01 ]T  | 

‘  -  4  \  F.6 

x,  =  2  2[00100. . ,010]T  ( 

-  -  I 

x4  =  11  2 [000111... 1100]T  j 
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They  are  normalized  so  that 


F.7 


The  first  three  vectors  are  arranged  to  make  independent  estimates  of  the 
center  of  the  spectrum,  the  pair  of  adjacent  points,  and  the  pair  of  next- 
to-adjacent  points.  The  fourth  vector  is  structured  to  obtain  a  noise 
level  estimate  by  averaging  over  the  remaining  points. 


A  similar  set  of  vectors  are  used  in  the  SEV  algorithm.  These 
are  computed  through  an  eigenvector-eigenvalue  numerical  analysis  of  a  16  x 16 
matrix  S.  The  eigenvector  of  S  corresponding  to  nonzero  eigenvalues  are 
the  3C ^  vectors  in  the  algorithm.  The  matrix  S  is  a  sum  of  dyads  of 
spectra  from  the  predicted  operating  range  of  the  receiver.  In  matrix- 
vector  notation: 


„2 

o.  a. 

-l  —l 


J 

£  =  L 

i=l 
2 

where  the  vectors  are  defined  by  Equation  67  of  Section  III. 

notation,  the  elements  of  S  are 


F.  8 

In  scalar 


[a] 


n,l 


n  =  0, . . . ,  15 
l  =  0,. . . ,  15 


F.9 


Since  S^  is  symmetric,  n  may  be  considered  the  index  of  rows  and  i  the 

2 

index  of  columns  or  vice  versa.  Each  of  the  vectors  o7  is  a  sixteen- 

—l 

dimensional  discrete  power  spectral  density  (expectation  or  mean  value  of 

the  magnitude-square  DFT  coefficients  of  the  received  signal-plus-noise 

2 

random  process).  For  the  results  given  in  Section  6,  eight  a  were 
selected  along  the  line  of  10  percent  error  probability  from  the  receiver 
operating  characteristic  curve  of  the  optimal  receiver  (Figure  12).  This 
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set  gave  excellent  performance,  very  close  to  optimal,  throughout  the  useful 
operating  range  of  the  (optimal)  receiver.  The  data  reduction  equation  is 


1  ,m 


15 

=  £ 

n=0 


x. 

i,n 


n  ,m  1 


i  =  1, 


F.10 


The  m  are  averaged  over  prior  observation  intervals  by  a  set  of  single 
pole  digital  filters 


c.  =  K  c.  , 
i,m  i,m-l 


l  ,m 


F.ll 


where  K<1  is  the  pole  location  of  each  of  the  filters.  The  appropriate 
value  of  K  for  a  given  time  constant  of  Q  observation  intervals  is  given 
by 


K  = 


,-!/Q 


F.  12 


The  averaged  coefficients  c ^  m  are  expanded  into  a  sixteen 
dimensional  spectral  estimate  by  the  expression 


P 

^  Ci,m  xi,k 


k  =  0 ,  ...,  15  F.13 


This  operation  is  referred  to  as  an  expander  in  the  block  diagram.  These 
estimates  are  subsequently  inverted 


r1 

1  k  ,m‘ 


k  =  0,  . . . ,  15 


to  obtain  the  inverse  spectrum  required  by  the  receiver. 


F.  14 


The  time  delay  T  shown  in  the  diagram  does  not  actually  occur 

as  a  final  step  in  the  sequence  of  operations.  It  is  inserted  to  show 

th 

that  the  estimate  used  in  the  decision  logic  at  the  m  modulation  interval 
will  not  include  data  from  the  in  step  which  is  not  available  until  the 
m^  decision  has  been  made. 
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APPENDIX  G 


PRINCIPAL  COMPONENTS  AND  THE  REDUCED 
DIMENSION  SPECTRAL  ESTIMATE 


The  connection  between  the  data  reduction  technique  incorporated 
in  the  SEV  algorithm  and  principal  component  analysis  in  statistics  is  ■<, 

explored.  It  is  shown  that  whereas  principal  components  are  designed  to 
retain  as  much  as  possible  of  the  covariance  structure  of  a  random  vector 
in  a  reduced  dimensional  representation,  the  coefficients  of  the  SEV  algorithm 
are  designed  to  minimize  the  combined  covariance  and  squared  bias  of  the  re¬ 
duced  dimension  spectral  estimate. 

PRINCIPAL  COMPONENT  ANALYSIS 

Principal  component  analysis  was  introduced  by  H.  Hotelling1*"  in  1933. 
It  is  a  technique  of  deriving  a  linear  transformation  of  a  random  vector  which 
results  in  a  lower  dimension  vector  while  preserving  as  much  of  the  variance 
of  the  original  vector  as  possible.  Given  a  random  N-vector  x  with  co- 
variance  matrix  Z,  it  is  desired  to  construct  a  P-vector 

y  =  HTx  G. 1 

T 

where  H  is  an  N*P  matrix  with  orthonormal  columns  (H  H=I_)  so  that  the 
total  variation  of  y,  defined  as  the  sum  of  variances  of  the  components,  is 
maximized.  The  solution  is  given  by  a  matrix  H  whose  columns  are  the 
eigenvectors  of  £_  associated  with  the  P  greatest  eigenvalues. 
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The  optimality  of  principal  components  may  also  be  viewed  in  terms 
of  maximizing  the  trace  of  the  covariance  matrix  (maximizing  the  total  varia¬ 
tion)  of  an  approximation  to  £  i-n  ^ts  same  coordinates.  Consider  recon- 

A 

structing  an  N-dimensional  variable  x^  from  the  principal  component  vector 
y 

X  =  HL  G.2 

T 

=  H  H  x 

The  covariance  matrix  of  x  is 

.  /v  at  .  T  T 

E{xx  }  =  H  H  IH  H  G.  3 

A 

which  has  rank  P<N.  The  vector  x  has  a  singular  covariance  matrix  since 
it  has  less  than  N  degrees  of  freedom.  If  the  columns  of  H  are  taker 
as  the  eigenvectors  of  £_  associated  with  the  largest  eigenvalues  then  the 

tr{HHTE  HHT}=  tr{HTEH}  G.4 

T 

is  maximized  over  the  class  of  N  *  P  matrices  H  for  which  H  H  =  _T.  It 
may  also  be  demonstrated 11 5  that  the 

Norm{L  -  HHTZ^HHT}  G.5 

is  minimized  by  the  same  11  where  the  Euclidean  norm 

Norm  {A)  =  £[A]2.  1/2  G.6 

ii 

is  used.  Thus  the  covariance  matrix  of  x  is  the  best  approximation  to  Z 
in  a  mean  square  sense,  by  a  rank  P  matrix. 

DATA  REDUCED  SPECTRAL  ESTIMATE 

In  reducing  the  dimension  of  the  power  spectrum  estimate  used  in 
the  adaptive  FSK  receiver  we  are  confronted  with  a  similar  problem  to  that 
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of  principal  components.  Here  it  is  desired  to  derive  an  N*N  linear 

T 

projection  operator  K  =  XX  of  rank  P  <N  so  that  the  raw  spectral 

estimate  may  replaced  with  a  reduced  estimate 


G.7 


in  such  a  way  that  the  least  additional  cost,  in  terms  of  increased  proba¬ 
bility  of  misclassification ,  is  incurred.  In  Section  V  it  is  shown  that  a 
formal  Taylor  expansion  of  the  cost,  about  its  value  at  the  correct  spectrum, 
with  truncation  at  the  quadratic  term  yields  the  risk  function 


ae(c~, K) 


=  tr  {C^ 


E{K  a2  a2r 
- s  — s 


kt}} 


G .  8 


The  expansion  is  referred  to  as  formal  since  the  weight  matrix  cannot 

be  practically  evaluated.  Each  component  of  the  matrix 

M  =  E{K  a2  $  KT)  ,  G. 9 

—0  - s  — s  — 

which  may  be  referred  to  as  the  total  squared  deviation,  is  weighted  by  the 
corresponding  entry  of  C  to  assign  the  relative  importance  of  deviations 

A  1 

of  the  estimate  a  from  the  true  spectrum  a".  Equation  G.9  can  be  broken 
down  into  two  terms 


=J(  D  KT  +  (K-  I)  2.2l2T(i'PT  G-10 

which  represent  the  covariance  of  the  reduced  spectral  estimate  and  squared 
bias  respectively.  This  breakdown  of  the  error,  from  which  the  term  "total 
squared  deviation"  derives,  is  familiar  from  the  literature  of  power  spectral 
density  estimation  as  a  measure  of  the  quality  of  window  functions. 


It  is  necessary  to  average  the  risk  function  over  an  ensemble  of 
2 

power  spectra  a  that  are  predicted  for  the  receiver.  Otherwise  the 

2 

optimal  K  would  depend  on  the  particular  spectrum  0  and  would  change 
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with  the  spectrum  rather  than  be  fixed  for  the  entire  range  of  fading  rates 
and  signal-to-noise  ratios.  In  performing  this  average  it  is  desired  to 
retain  the  form  of  the  risk  function  as  the  trace  of  the  product  of  two 
matrices. 


G.  11 


A  weighted  average  C  as  indicated  by  Equation  G.ll  will  exist  as  long  as 
is  nonsingular 


L 

l  M 

i=l  a. 
1 


C 


C 

— o .  ~~o . 
1  1 


]T  M 
i=l  l 


g.  i: 


It  is  expected  that  the  desired  inverse  will  exist  since  each  component  in 
the  sum  is  invertible  and  there  is  no  reason  to  believe  that  the  terms  will 
combine  in  such  a  way  that  the  rank  of  the  total  will  be  reduced.  It  is 
therefore  possible  to  consider  the  cost  function 


where 


and 


tr{C  M}  =  tr{C[K  D  K T  +  (K  -  J_)S (K  -  I)T]  > 


2  •  i  E  0 

L  i-1  1 


s  -  i  Z  £ 

L  0=1 


G.  13 


G.  14 


G.  15 


In  Section  V,  the  approach  taken  to  deriving  the  best  projection  operator  K 
was  to  minimize  the  second  term  in  Equation  G.13,  that  corresponding  to 
squared  bias,  while  ignoring  the  covariance  term.  It  was  discovered  experi¬ 
mentally  that  the  matrix  S*  was  of  considerably  low  rank  compared  to  its 

T 

dimension  and  therefore  (JC- p  was  chosen  so  that 

(K- I)  =  V  YT  G.  16 
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where  the  N  * (N-P)  matrix  is  made  up  of  orthogonal  columns  which  lie  in 

the  null  space  of  J3.  The  particular  circumstances  of  the  problem  allowed 

the  spectrum  estimate  to  be  reduced  in  dimension  without  paying  for  the  data 

reduction  in  the  bias  term  of  the  cost.  Here  we  have  used  essentially  the 

same  approach  to  deriving  a  projection  operator  as  that  given  by  principal 

T  T 

components.  The  problem  is  to  minimize  tr(C  Y  Y  S  Y  Y  }  rather  than  maxi- 

T 

mize  an  unweighted  tr(H  H£H  H  }.  The  eigenvectors  of  S_  associated  with 

the  minimum  eigenvalues  (all  approximately  zero)  were  selected.  In  both 

cases  the  extreme  values  of  the  criterion  function  are  achieved  by  using  a 

set  of  eigenvectors  (though  the  null  space  eigenvectors  of  S_  are  not 

unique).  Furthermore,  is  not  a  covariance  matrix  like  £  in  the 

principal  component  analysis,  but  rather  a  squared  mean  of  the  raw  statistic 
'T' 

o  .  The  fact  that  S  is  not  full  rank  is  fortunate,  since  otherwise  the 
solution  would  depend  on  C  which  is  practically  unavailable.  This  leaves 
open  the  question  of  the  minimization  of  Equation  G.13  where  55  may  not  be 
of  sufficiently  reduced  rank  and  C  is  available  or  an  appropriate  £  could 
be  hypothesized.  In  the  remainder  of  this  appendix  the  equations  are  given  for 
for  the  general  case. 

OPTIMAL  PROJECTION  OPERATOR 

Equation  G.13  may  be  expanded  to 

tr{C  M}  =  tr{C  XDK  +  CKSK-CSK-CKS  +  CS} 

=  tr{C  K(D+S) K  -  (C  S+S  C) K *  C  S}  G.17 

Here,  the  property  of  the  trace  that 

tr{A  B}  =  tr{B  A}  G.18 

T 

along  with  the  additional  assumption  K  =  K  have  been  used.  Now  sub- 
T  ~ 

stituting  and  making  further  use  of  the  trace  property  and  the 

T 

assumption  that  =  2  yields 

tr{C  M}  =  tr(XTC  X  XT(D+S)X  -  XT(C  S+S  C)X}  +  tr{C  S}  G.19 
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3 


Now  the  object  is  to  find  the  Nx  P  matrix  X.  of  orthonormal 
columns  which  minimizes  tr{C  M} .  The  extreme  points  of  this  function  may 
be  found  by  the  usual  gradient  techniques  with  the  addition  of  Lagrange 
multipliers.  The  equations  describing  the  column  vectors  of  X.  at  these 
exteme  points  are 


tr{C  M} 


tr{C  M} 


P  T 

L  Mlij-D 

j=l  J  J  ] 


=  o 

a  =  1,..,P 


y.  x.(xTx.-ijl 

£1  r-i-3  J 


=  o 
b  »  i...  ,p 


G.  20 


G.  21 


where  the  X^  are  Lagrange  multipliers  arranged  to  constrain  the  lengths  of 
the  column  vectors  of  X  to  unity  length.  It  is  not  necessary  to  constrain 
these  column  vectors  to  be  orthogonal  since  any  rank  P  solution  to  Equation 
G. 20  is  an  X  with  orthogonal  columns  without  such  a  constraint .  In  carrying 
out  the  first  of  these  partial  derivatives  it  is  convenient  to  write  out 


P  .1,  P 


tr{C  M}  =  £  xTc  £x.J(D*S)x. 

i=l  1  j=l  J  J 


-  T.  -VCC  s+s  C)x..  +  tr{C  s) 
k=l 

P  P 

=  Z  x]c  £  X  xJ(D+S)x 
i=l  1  j=l  J  ] 

Ma  j?a 


i=  1 

i^a  „ 


+  x  C  T  x.x!(D+S) 
-  - 


+  x^Cx  xT(D+S)x 
~ - —a 


x 

—a 


*  Z  x7 (c  S+S  Cjx.  +  tr{C  S) 
k=  1  K 


G.  22 
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The  second  and  third  terras  of  this  expression  are  equal.  The  first  term  does 
not  enter  the  partial  derivative  since  it  is  not  a  function  of  x^,  nor  does 
the  last  term,  proceeding: 


From  Equation  G.24  one  may  deduce  that,  if  P  linearly  independent  solutions 

exist,  they  will  form  an  orthogonal  set  (or  can  be  arranged  to  do  so)  since 

they  are  eigenvectors  of  a  real  symmetric  matrix.  Each  of  the  matrices  D, 

T 

£,  and  £  are  themselves  symmetric  and  XX  is  symmetric  regardless  of 
whether  the  columns  of  X  are  orthogonal.  The  matrix  operating  on  x  is 
in  the  form  of  one  matrix  plus  its  transpose  which  sum  is  always  symmetric. 
The  fact  that  the  variable  matrix  X  is  imbedded  in  the  symmetric  matrix 
does  not  alter  the  conclusion  about  the  orthogonality  of  a  given  set  of  P 
vectors  x  which  satisfy  the  equation.  It  is  convenient  to  arrange  all  of 
the  P  equations  into  one  matrix  equation 

[C  X  X.T(D+§)  +  (D+S)X  XTC  -  C  S-S  C]X  =  X  \  G.  25 
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where  A  is  a  diagonal  matrix  with  diagonal  elements  consisting  of  the 
Lagrange  multipliers  along  the  diagonal.  The  second  of  the  two  partial 

derivative  equations.  Equation  G.21,  yields  the  constraint 

T 

jc.jc,  =  1  ,  i  =  l,...,P  G .  26 

These  two  equations  do  not  generally  have  a  solution  of  rank  P,  although 

there  is  always  a  rank  one  solution.  If  X  is  of  rank  one  then 
T  T 

XX  =  Px  x  and  Equation  G.24  becomes 
9."  a 

[p  c  x^  /(D+S)  ♦  P(D>S)xax^C  -  (C  S+S  C)!^  =  x^ 
then  if  x^  is  an  eigenvector  of 


where 


ai£  +  ot2  (2+£)  -  £  s  +  s  c 


a,  =  P  xT  (D+S)  x 
1  -a - '—a 


The  matrix  X  made  up  of  P  columns  all  equal  to  x^  will  satisfy  the 
extremal  equations. 


Placing  additional  constraints  that  the  columns  of  X  be  ortho¬ 
gonal  yields  the  same  Equation  G.25  with  the  exception  that  the  P*P  matrix 

A_  be  a  general  real  symmetric  matrix  rather  than  a  diagonal  matrix.  Such 

T 

a  matrix  may  be  written  in  the  form  £  A_'  G.  where  £  is  an  orthogonal 
matrix  and  /£'  is  diagonal : 

[C  X  XT (D+S)  +  (D+S)X  XTC  -  CS-S  C]X  =  X  £  A '£T 

This  equation  still  requires  that  the  columns  of  X  define  an  invariant 
subspace  of  the  matrix  in  brackets  —  a  condition  which  is  evidently  not 
possible  to  satisfy.  It  also  indicates  that  X  £  is  a  solution  to  the 
original  equation. 
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One  may  conclude  tentatively  from  this  result  that  the  appropriate 

partial  derivatives  of  the  trace  (Equation  G.17)  do  not  go  to  zero  at  any 
T  T 

points  XX  of  rank  P>1,  for  which  X.  X.  is  a  linear  projection  operator. 
Thus  it  is  possible  that  the  Lagrangian  techniques  will  not  be  useful  to 
determine  the  minimizing  linear  projection  operator. 

The  means  to  minimize  Equation  G.24  by  either  analytical  or 
numerical  methods  are  still  under  investigation.  The  difficulty  with  this 
problem  seems  to  arise  from  the  fact  that  one  is  searching  for  a  subspace  — 
that  spanned  by  the  columns  of  X  —  rather  than  for  a  unique  vector  as  in 
the  usual  problem  of  the  minimization  of  a  functional.  The  solution  appears 
to  be  of  fundamental  interest  as  a  means  to  improve  multidinensional  estima¬ 
tors  by  data  reduction. 
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